Synopsis

The goal of the project is to predict the quality of exercise activities. This project uses the data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).

The training data for this project are available here:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv

The test data are available here:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv

This report describes the model, cross validation, and the expected out of sample error. Finally, the prediction model is used to predict 20 different test cases.

Data Processing

Load the training and testing data:

training = read.csv("pml-training.csv")
testing = read.csv("pml-testing.csv")
dim(training)
## [1] 19622   160
dim(testing)
## [1]  20 160

Check how many columns having more than 80% missing values (NAs) in the testing data set:

sum(sapply(testing, function(e) {sum(is.na(e)) / nrow(testing) > 0.8}))
## [1] 100

Remove the columns with more than 80% missing values from both the training and testing data sets:

missingCols  = sapply(testing, function(e) {sum(is.na(e)) / nrow(testing) > 0.8})
trainingNNa =  subset(training, select = !missingCols)
testingNNa = subset(testing, select = !missingCols)
dim(trainingNNa)
## [1] 19622    60
dim(testingNNa)
## [1] 20 60

What is the objective variable and its type?

str(trainingNNa$class)
##  Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(trainingNNa$class)
##    A    B    C    D    E 
## 5580 3797 3422 3216 3607

There are 5 different discrete values. By the nature of the objective variable, we will train a classification model for the prediction problem.

str(trainingNNa)
## 'data.frame':    19622 obs. of  60 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ user_name           : Factor w/ 6 levels "adelmo","carlitos",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ raw_timestamp_part_1: int  1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
##  $ raw_timestamp_part_2: int  788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
##  $ cvtd_timestamp      : Factor w/ 20 levels "02/12/2011 13:32",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ new_window          : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ num_window          : int  11 11 11 12 12 12 12 12 12 12 ...
##  $ roll_belt           : num  1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
##  $ pitch_belt          : num  8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
##  $ yaw_belt            : num  -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
##  $ total_accel_belt    : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ gyros_belt_x        : num  0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
##  $ gyros_belt_y        : num  0 0 0 0 0.02 0 0 0 0 0 ...
##  $ gyros_belt_z        : num  -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
##  $ accel_belt_x        : int  -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
##  $ accel_belt_y        : int  4 4 5 3 2 4 3 4 2 4 ...
##  $ accel_belt_z        : int  22 22 23 21 24 21 21 21 24 22 ...
##  $ magnet_belt_x       : int  -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
##  $ magnet_belt_y       : int  599 608 600 604 600 603 599 603 602 609 ...
##  $ magnet_belt_z       : int  -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
##  $ roll_arm            : num  -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
##  $ pitch_arm           : num  22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
##  $ yaw_arm             : num  -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
##  $ total_accel_arm     : int  34 34 34 34 34 34 34 34 34 34 ...
##  $ gyros_arm_x         : num  0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
##  $ gyros_arm_y         : num  0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
##  $ gyros_arm_z         : num  -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
##  $ accel_arm_x         : int  -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
##  $ accel_arm_y         : int  109 110 110 111 111 111 111 111 109 110 ...
##  $ accel_arm_z         : int  -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
##  $ magnet_arm_x        : int  -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
##  $ magnet_arm_y        : int  337 337 344 344 337 342 336 338 341 334 ...
##  $ magnet_arm_z        : int  516 513 513 512 506 513 509 510 518 516 ...
##  $ roll_dumbbell       : num  13.1 13.1 12.9 13.4 13.4 ...
##  $ pitch_dumbbell      : num  -70.5 -70.6 -70.3 -70.4 -70.4 ...
##  $ yaw_dumbbell        : num  -84.9 -84.7 -85.1 -84.9 -84.9 ...
##  $ total_accel_dumbbell: int  37 37 37 37 37 37 37 37 37 37 ...
##  $ gyros_dumbbell_x    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ gyros_dumbbell_y    : num  -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
##  $ gyros_dumbbell_z    : num  0 0 0 -0.02 0 0 0 0 0 0 ...
##  $ accel_dumbbell_x    : int  -234 -233 -232 -232 -233 -234 -232 -234 -232 -235 ...
##  $ accel_dumbbell_y    : int  47 47 46 48 48 48 47 46 47 48 ...
##  $ accel_dumbbell_z    : int  -271 -269 -270 -269 -270 -269 -270 -272 -269 -270 ...
##  $ magnet_dumbbell_x   : int  -559 -555 -561 -552 -554 -558 -551 -555 -549 -558 ...
##  $ magnet_dumbbell_y   : int  293 296 298 303 292 294 295 300 292 291 ...
##  $ magnet_dumbbell_z   : num  -65 -64 -63 -60 -68 -66 -70 -74 -65 -69 ...
##  $ roll_forearm        : num  28.4 28.3 28.3 28.1 28 27.9 27.9 27.8 27.7 27.7 ...
##  $ pitch_forearm       : num  -63.9 -63.9 -63.9 -63.9 -63.9 -63.9 -63.9 -63.8 -63.8 -63.8 ...
##  $ yaw_forearm         : num  -153 -153 -152 -152 -152 -152 -152 -152 -152 -152 ...
##  $ total_accel_forearm : int  36 36 36 36 36 36 36 36 36 36 ...
##  $ gyros_forearm_x     : num  0.03 0.02 0.03 0.02 0.02 0.02 0.02 0.02 0.03 0.02 ...
##  $ gyros_forearm_y     : num  0 0 -0.02 -0.02 0 -0.02 0 -0.02 0 0 ...
##  $ gyros_forearm_z     : num  -0.02 -0.02 0 0 -0.02 -0.03 -0.02 0 -0.02 -0.02 ...
##  $ accel_forearm_x     : int  192 192 196 189 189 193 195 193 193 190 ...
##  $ accel_forearm_y     : int  203 203 204 206 206 203 205 205 204 205 ...
##  $ accel_forearm_z     : int  -215 -216 -213 -214 -214 -215 -215 -213 -214 -215 ...
##  $ magnet_forearm_x    : int  -17 -18 -18 -16 -17 -9 -18 -9 -16 -22 ...
##  $ magnet_forearm_y    : num  654 661 658 658 655 660 659 660 653 656 ...
##  $ magnet_forearm_z    : num  476 473 469 469 473 478 470 474 476 473 ...
##  $ classe              : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...

Remove the first 7 variables that don’t provide predicting values.

trainingNNa = trainingNNa[, -c(1:7)]
testingNNa = testingNNa[, -c(1:7)]
dim(trainingNNa)
## [1] 19622    53
dim(testingNNa)
## [1] 20 53

Visualizations

Randomly pick up 5 variables and create a scatter-plot matrix:

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
featurePlot(x = trainingNNa[, c(1, 10, 20, 35, 45)], y = trainingNNa$class, plot = "pairs", auto.key = list(columns = 5))

Pre-Processing

Separate the predictors and class labels.

trainingVars = trainingNNa[, c(1:52)]
objLabels = trainingNNa[, 53]
testingVars = testingNNa[, c(1:52)]
testingProdIds = testingNNa[, 53]

Center and scale both training and testing data

library(caret)
preproc = preProcess(trainingVars)
trainingVarsProc = predict(preproc, trainingVars)
testingVarsProc = predict(preproc, testingVars)

Are there any near-zero-variance variables?

nearZeroVar(trainingVarsProc)
## integer(0)

Are there highly correlated variables?

descrCor <- cor(trainingVarsProc)
highCorr <- sum(abs(descrCor[upper.tri(descrCor)]) > 0.75)
highCorr
## [1] 31

Remove highly correlated variables

highlyCorDescr <- findCorrelation(descrCor, cutoff = 0.75)
trainingVarsProcNCor = trainingVarsProc[, -highlyCorDescr]
testingVarsProcNCor = testingVarsProc[, -highlyCorDescr]
dim(trainingVarsProcNCor)
## [1] 19622    32
dim(testingVarsProcNCor)
## [1] 20 32

Training and Tuning through Cross Validation

Data-Spliting

The pre-processed training data is split into two subsets: 75% for training prediction models and 25% as validation set for evaluating error rates and selecting models.

inTrain = createDataPartition(objLabels, p = 0.75, list = FALSE)
forTrainingVars = trainingVarsProcNCor[inTrain, ]
forTestingVars = trainingVarsProcNCor[-inTrain, ]
forTrainingLabels = objLabels[inTrain]
forTestingLabels = objLabels[-inTrain]

Estimating a Baseline Error Rate

For this classification problem, I first train a decision tree and evaluate its error rate on the validation set. I will use the error rate of the decision tree as a baseline error rate. In the later part of the document, I will train other models and use cross validation to select the best one.

Train a decision tree using the caret package.

set.seed(2345)
modeldt = train(y = forTrainingLabels, x = forTrainingVars, method = "rpart")

Predict the labels for the validation set and estimate the error rate.

preddt = predict(modeldt, forTestingVars)
## Loading required package: rpart
confusionMatrix(forTestingLabels, preddt)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   A   B   C   D   E
##          A 854 249 231  61   0
##          B 141 585 186  31   6
##          C  25 173 610  46   1
##          D  33 243 191 230 107
##          E  17 363 133  35 353
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5367          
##                  95% CI : (0.5226, 0.5507)
##     No Information Rate : 0.3289          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4174          
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.7981   0.3627   0.4515  0.57072  0.75589
## Specificity            0.8589   0.8894   0.9310  0.87247  0.87649
## Pos Pred Value         0.6122   0.6164   0.7135  0.28607  0.39179
## Neg Pred Value         0.9384   0.7401   0.8170  0.95780  0.97152
## Prevalence             0.2182   0.3289   0.2755  0.08218  0.09523
## Detection Rate         0.1741   0.1193   0.1244  0.04690  0.07198
## Detection Prevalence   0.2845   0.1935   0.1743  0.16395  0.18373
## Balanced Accuracy      0.8285   0.6260   0.6913  0.72160  0.81619

The accuracy of the decision tree is 0.537. To improve the accuracy, I will train several different models in the following.

Training Different Models and Estimating Error Rates

Will Naive Bayes Model Improve the Accuracy?

Train a naive Bayes model and estimate its accuracy. To speed up the training process, I only use 10% of the training data.

set.seed(2345)
library(klaR)
## Loading required package: MASS
library(MASS)
inTrain = createDataPartition(forTrainingLabels, p = .1, list = FALSE)
smallTrainingVars = forTrainingVars[inTrain, ]
smallTrainingLabels = forTrainingLabels[inTrain]
modelnb = train(y = smallTrainingLabels, x = smallTrainingVars, method = "nb", )
prednb = predict(modelnb, forTestingVars)
confusionMatrix(forTestingLabels, prednb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   A   B   C   D   E
##          A 901  88 257 147   2
##          B  80 608 156  87  18
##          C  65  64 648  72   6
##          D  39  19 199 505  42
##          E  41 109  96  75 580
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6611          
##                  95% CI : (0.6477, 0.6743)
##     No Information Rate : 0.2765          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5751          
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.8002   0.6847   0.4779   0.5700   0.8951
## Specificity            0.8692   0.9151   0.9417   0.9256   0.9246
## Pos Pred Value         0.6459   0.6407   0.7579   0.6281   0.6437
## Neg Pred Value         0.9359   0.9292   0.8251   0.9071   0.9830
## Prevalence             0.2296   0.1811   0.2765   0.1807   0.1321
## Detection Rate         0.1837   0.1240   0.1321   0.1030   0.1183
## Detection Prevalence   0.2845   0.1935   0.1743   0.1639   0.1837
## Balanced Accuracy      0.8347   0.7999   0.7098   0.7478   0.9098

The accuracy of the naive Bayes model is 0.66 which is better than that (0.537) of the decision tree although only 10% of the training data is used.

Building Random Forest Models

Train a random forest model using the 10% of the training data as used in training the naive Bayes model.

modelrf = train(y = smallTrainingLabels, x = smallTrainingVars, method = "rf", )
## Loading required package: randomForest
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
predrf = predict(modelrf, forTestingVars)
confusionMatrix(forTestingLabels, predrf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1365   11    4   14    1
##          B   47  831   45   23    3
##          C    0   58  767   29    1
##          D    1    2   45  755    1
##          E    2   34    8   16  841
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9296          
##                  95% CI : (0.9221, 0.9367)
##     No Information Rate : 0.2885          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.911           
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9647   0.8878   0.8826   0.9020   0.9929
## Specificity            0.9914   0.9703   0.9782   0.9880   0.9852
## Pos Pred Value         0.9785   0.8757   0.8971   0.9391   0.9334
## Neg Pred Value         0.9858   0.9735   0.9748   0.9800   0.9985
## Prevalence             0.2885   0.1909   0.1772   0.1707   0.1727
## Detection Rate         0.2783   0.1695   0.1564   0.1540   0.1715
## Detection Prevalence   0.2845   0.1935   0.1743   0.1639   0.1837
## Balanced Accuracy      0.9780   0.9290   0.9304   0.9450   0.9891

Using only the 10% of the training data, the accuracy of the random forest model is 0.929 which is much better than that (0.66) of the naive Bayes model. Let us train a random forest model using all the available training data.

modelrfall = train(y = forTrainingLabels, x = forTrainingVars, method = "rf", )
predrfall = predict(modelrfall, forTestingVars)
confusionMatrix(forTestingLabels, predrfall)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1395    0    0    0    0
##          B    2  944    3    0    0
##          C    0    7  844    4    0
##          D    0    0    6  798    0
##          E    0    0    0    1  900
## 
## Overall Statistics
##                                         
##                Accuracy : 0.9953        
##                  95% CI : (0.993, 0.997)
##     No Information Rate : 0.2849        
##     P-Value [Acc > NIR] : < 2.2e-16     
##                                         
##                   Kappa : 0.9941        
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9986   0.9926   0.9894   0.9938   1.0000
## Specificity            1.0000   0.9987   0.9973   0.9985   0.9998
## Pos Pred Value         1.0000   0.9947   0.9871   0.9925   0.9989
## Neg Pred Value         0.9994   0.9982   0.9978   0.9988   1.0000
## Prevalence             0.2849   0.1939   0.1739   0.1637   0.1835
## Detection Rate         0.2845   0.1925   0.1721   0.1627   0.1835
## Detection Prevalence   0.2845   0.1935   0.1743   0.1639   0.1837
## Balanced Accuracy      0.9993   0.9957   0.9934   0.9962   0.9999

Great! The accuracy of the random forest model trained by using all the available training data is 0.99.

Will Feature Selection Help?

I am curious about whether we can select a smaller set of features/predictors to further improve the training process, e.g., better accuracy or faster training time.

First, create an integer vector for the specific subset sizes of the predictors that should be tested.

subsets <- c(1:5, 10, 15, 20, 25)

Use recursive feature elimination via caret to find the important features. We train a series random forest models and select features through repeated cross validations on different sizes of feature sets.

set.seed(2345)

library(Hmisc)
## Loading required package: grid
## Loading required package: survival
## Loading required package: splines
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:caret':
## 
##     cluster
## 
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following object is masked from 'package:randomForest':
## 
##     combine
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
ctrl <- rfeControl(functions = rfFuncs,
                   method = "repeatedcv",
                   repeats = 5,
                   verbose = FALSE)

rfProfile <- rfe(smallTrainingVars, smallTrainingLabels, 
                 sizes = subsets,
                 rfeControl = ctrl)
## 
## Attaching package: 'plyr'
## 
## The following objects are masked from 'package:Hmisc':
## 
##     is.discrete, summarize
rfProfile
## 
## Recursive feature selection
## 
## Outer resampling method: Cross-Validated (10 fold, repeated 5 times) 
## 
## Resampling performance over subset size:
## 
##  Variables Accuracy  Kappa AccuracySD KappaSD Selected
##          1   0.5007 0.3634    0.08638 0.10988         
##          2   0.6920 0.6091    0.05889 0.07496         
##          3   0.7794 0.7203    0.04397 0.05578         
##          4   0.8158 0.7667    0.03241 0.04093         
##          5   0.8384 0.7954    0.03091 0.03921         
##         10   0.8914 0.8625    0.02705 0.03435         
##         15   0.9138 0.8909    0.02042 0.02589         
##         20   0.9166 0.8943    0.02326 0.02949        *
##         25   0.9159 0.8935    0.02173 0.02752         
##         32   0.9145 0.8917    0.01966 0.02490         
## 
## The top 5 variables (out of 20):
##    yaw_belt, magnet_dumbbell_z, magnet_belt_z, pitch_forearm, roll_dumbbell
OK. Let Us Train a Random Forest Model with only Five Variables

Extract the data.

featureEliTrainingVars = forTrainingVars[, c("magnet_belt_z", "magnet_dumbbell_z", "yaw_belt", "roll_dumbbell", "pitch_forearm")]
featureEliTestingVars = forTestingVars[, c("magnet_belt_z", "magnet_dumbbell_z", "yaw_belt", "roll_dumbbell", "pitch_forearm")]
featureEliFinalTestingVars = testingVarsProcNCor[, c("magnet_belt_z", "magnet_dumbbell_z", "yaw_belt", "roll_dumbbell", "pitch_forearm")]
dim(featureEliTrainingVars)
## [1] 14718     5
dim(featureEliTestingVars)
## [1] 4904    5
dim(featureEliFinalTestingVars)
## [1] 20  5

Train a random forest model using only 5 variables.

modelrf5vars = train(y = forTrainingLabels, x = featureEliTrainingVars, method = "rf", )
predrf5vars = predict(modelrf5vars, featureEliTestingVars)
confusionMatrix(forTestingLabels, predrf5vars)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1344   23   12   15    1
##          B   28  865   35   20    1
##          C    8   24  789   34    0
##          D   10    8   25  759    2
##          E    0   11    6    5  879
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9454          
##                  95% CI : (0.9386, 0.9515)
##     No Information Rate : 0.2834          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9309          
##  Mcnemar's Test P-Value : 0.002189        
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9669   0.9291   0.9100   0.9112   0.9955
## Specificity            0.9855   0.9789   0.9837   0.9889   0.9945
## Pos Pred Value         0.9634   0.9115   0.9228   0.9440   0.9756
## Neg Pred Value         0.9869   0.9833   0.9807   0.9820   0.9990
## Prevalence             0.2834   0.1898   0.1768   0.1699   0.1801
## Detection Rate         0.2741   0.1764   0.1609   0.1548   0.1792
## Detection Prevalence   0.2845   0.1935   0.1743   0.1639   0.1837
## Balanced Accuracy      0.9762   0.9540   0.9468   0.9501   0.9950

The accuracy of the model trained by only 5 variables is 0.95.

Estimated Out of Sample Error Rate

According to the results of training, tuning, and testing different models on the set of validation data, I estimate the out of sample error rate is 0.99 with 95% confidence interval [0.993, 0.997].

Testing Results

Apply the random forest models trained by the full set of variables and the reduced set of variables to the test set with 20 observations. Check their prediction agreement.

resultsrf5vars = predict(modelrf5vars, featureEliFinalTestingVars)
## Loading required package: randomForest
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
resultsrfall = predict(modelrfall, testingVarsProcNCor)
confusionMatrix(resultsrf5vars, resultsrfall)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction A B C D E
##          A 6 0 0 0 0
##          B 0 8 0 0 0
##          C 1 0 1 0 1
##          D 0 0 0 1 0
##          E 0 0 0 0 2
## 
## Overall Statistics
##                                          
##                Accuracy : 0.9            
##                  95% CI : (0.683, 0.9877)
##     No Information Rate : 0.4            
##     P-Value [Acc > NIR] : 5.041e-06      
##                                          
##                   Kappa : 0.8592         
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.8571      1.0   1.0000     1.00   0.6667
## Specificity            1.0000      1.0   0.8947     1.00   1.0000
## Pos Pred Value         1.0000      1.0   0.3333     1.00   1.0000
## Neg Pred Value         0.9286      1.0   1.0000     1.00   0.9444
## Prevalence             0.3500      0.4   0.0500     0.05   0.1500
## Detection Rate         0.3000      0.4   0.0500     0.05   0.1000
## Detection Prevalence   0.3000      0.4   0.1500     0.05   0.1000
## Balanced Accuracy      0.9286      1.0   0.9474     1.00   0.8333

The two models agree about 90% of the test cases.

The final submission of the 20 cases predicted by the random forest model trained on the full set of the variables gets 100% accuracy.