Import Library

## Loading required package: lattice
## Loading required package: ggplot2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: tibble
## Loading required package: bitops
## Rattle: A free graphical interface for data science with R.
## Version 5.4.0 Copyright (c) 2006-2020 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:rattle':
## 
##     importance
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin

Machine Learning Algorithm for Prediction

  • Decision Tree
  • Random Forest
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1202   24   96    0    5
##          B  397  276  218   34    1
##          C  371   35  395    0    0
##          D  351   10  292  137    0
##          E  184  149  221   36  262
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4838          
##                  95% CI : (0.4694, 0.4982)
##     No Information Rate : 0.5334          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3264          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.4798  0.55870  0.32324  0.66184  0.97761
## Specificity            0.9429  0.84531  0.88313  0.85453  0.86676
## Pos Pred Value         0.9058  0.29806  0.49313  0.17342  0.30751
## Neg Pred Value         0.6132  0.94218  0.78768  0.98208  0.99844
## Prevalence             0.5334  0.10520  0.26022  0.04408  0.05707
## Detection Rate         0.2560  0.05877  0.08411  0.02917  0.05579
## Detection Prevalence   0.2826  0.19719  0.17057  0.16823  0.18143
## Balanced Accuracy      0.7114  0.70201  0.60319  0.75818  0.92218
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1327    0    0    0    0
##          B    0  926    0    0    0
##          C    0    1  800    0    0
##          D    0    0    1  789    0
##          E    0    0    0    0  852
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9996          
##                  95% CI : (0.9985, 0.9999)
##     No Information Rate : 0.2826          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9995          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            1.0000   0.9989   0.9988   1.0000   1.0000
## Specificity            1.0000   1.0000   0.9997   0.9997   1.0000
## Pos Pred Value         1.0000   1.0000   0.9988   0.9987   1.0000
## Neg Pred Value         1.0000   0.9997   0.9997   1.0000   1.0000
## Prevalence             0.2826   0.1974   0.1706   0.1680   0.1814
## Detection Rate         0.2826   0.1972   0.1704   0.1680   0.1814
## Detection Prevalence   0.2826   0.1972   0.1706   0.1682   0.1814
## Balanced Accuracy      1.0000   0.9995   0.9992   0.9999   1.0000

Result

  • From the confusion matrix it is clear that random forest algorithm works better than decision tree. So using random forest model the prediction should be made.

Conclusion

##  [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E