Complete all Exercises, and submit answers to VtopBeta

Datasets

## Loading packages
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(knitr)
library(mlbench)
Iris dataset for training and testing
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa

Visualization

Using Iris dataset to perform a comparative analysis on the various classification algorithms and projecting the result on which performs well using visualization techniques.

control <- trainControl(method="repeatedcv",number=10,repeats=3)
set.seed(7)
fit.svm <- train(Species~., data=iris, method="svmRadial", trControl=control)
set.seed(7)
fit.knn <- train(Species~., data=iris, method="knn", trControl=control)
set.seed(7)
fit.rf <- train(Species~., data=iris, method="rf", trControl=control)
set.seed(7)
fit.nb <- train(Species~., data=iris, method="nb", trControl=control)
set.seed(7)
fit.decisionTree <- train(Species~., data=iris, method="treebag", trControl=control)
results <- resamples(list(DecisionTree=fit.decisionTree,NaiveBayes=fit.nb, SVM=fit.svm, KNN=fit.knn, RF=fit.rf))
summary(results)
## 
## Call:
## summary.resamples(object = results)
## 
## Models: DecisionTree, NaiveBayes, SVM, KNN, RF 
## Number of resamples: 30 
## 
## Accuracy 
##                   Min.   1st Qu.    Median      Mean 3rd Qu. Max. NA's
## DecisionTree 0.8000000 0.9333333 0.9666667 0.9511111       1    1    0
## NaiveBayes   0.8000000 0.9333333 1.0000000 0.9600000       1    1    0
## SVM          0.8666667 0.9333333 1.0000000 0.9666667       1    1    0
## KNN          0.8666667 0.9333333 1.0000000 0.9755556       1    1    0
## RF           0.8000000 0.9333333 1.0000000 0.9555556       1    1    0
## 
## Kappa 
##              Min. 1st Qu. Median      Mean 3rd Qu. Max. NA's
## DecisionTree  0.7     0.9   0.95 0.9266667       1    1    0
## NaiveBayes    0.7     0.9   1.00 0.9400000       1    1    0
## SVM           0.8     0.9   1.00 0.9500000       1    1    0
## KNN           0.8     0.9   1.00 0.9633333       1    1    0
## RF            0.7     0.9   1.00 0.9333333       1    1    0
scales <- list(x=list(relation="free"), y=list(relation="free"))
bwplot(results,scales=scales)

densityplot(results, scales=scales, pch = "|")

dotplot(results, scales=scales)

parallelplot(results)

splom(results)

diffs <- diff(results)
summary(diffs)
## 
## Call:
## summary.diff.resamples(object = diffs)
## 
## p-value adjustment: bonferroni 
## Upper diagonal: estimates of the difference
## Lower diagonal: p-value for H0: difference = 0
## 
## Accuracy 
##              DecisionTree NaiveBayes SVM       KNN       RF       
## DecisionTree              -0.008889  -0.015556 -0.024444 -0.004444
## NaiveBayes   1.0000                  -0.006667 -0.015556  0.004444
## SVM          0.1687       0.8307               -0.008889  0.011111
## KNN          0.1366       0.6984     1.0000               0.020000
## RF           1.0000       1.0000     0.5731    0.1737             
## 
## Kappa 
##              DecisionTree NaiveBayes SVM       KNN       RF       
## DecisionTree              -0.013333  -0.023333 -0.036667 -0.006667
## NaiveBayes   1.0000                  -0.010000 -0.023333  0.006667
## SVM          0.1687       0.8307               -0.013333  0.016667
## KNN          0.1366       0.6984     1.0000               0.030000
## RF           1.0000       1.0000     0.5731    0.1737

Inference:

Based on the above results using the visualization techniques, we can say that KNN and Naïve Bayes are more efficient and the Decision Tree is less efficient when compared to other classifiers.