Exercise 5

Datasets

## Loading packages
library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

library(knitr)
library(mlbench)

Iris dataset for training and testing
Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa

Visualization

Using Iris dataset to perform a comparative analysis on the various classification algorithms and projecting the result on which performs well using visualization techniques.

control <- trainControl(method="repeatedcv",number=10,repeats=3)
set.seed(7)
fit.svm <- train(Species~., data=iris, method="svmRadial", trControl=control)
set.seed(7)
fit.knn <- train(Species~., data=iris, method="knn", trControl=control)
set.seed(7)
fit.rf <- train(Species~., data=iris, method="rf", trControl=control)
set.seed(7)
fit.nb <- train(Species~., data=iris, method="nb", trControl=control)
set.seed(7)
fit.decisionTree <- train(Species~., data=iris, method="treebag", trControl=control)
results <- resamples(list(DecisionTree=fit.decisionTree,NaiveBayes=fit.nb, SVM=fit.svm, KNN=fit.knn, RF=fit.rf))
summary(results)

## 
## Call:
## summary.resamples(object = results)
## 
## Models: DecisionTree, NaiveBayes, SVM, KNN, RF 
## Number of resamples: 30 
## 
## Accuracy 
##                   Min.   1st Qu.    Median      Mean 3rd Qu. Max. NA's
## DecisionTree 0.8000000 0.9333333 0.9666667 0.9511111       1    1    0
## NaiveBayes   0.8000000 0.9333333 1.0000000 0.9600000       1    1    0
## SVM          0.8666667 0.9333333 1.0000000 0.9666667       1    1    0
## KNN          0.8666667 0.9333333 1.0000000 0.9755556       1    1    0
## RF           0.8000000 0.9333333 1.0000000 0.9555556       1    1    0
## 
## Kappa 
##              Min. 1st Qu. Median      Mean 3rd Qu. Max. NA's
## DecisionTree  0.7     0.9   0.95 0.9266667       1    1    0
## NaiveBayes    0.7     0.9   1.00 0.9400000       1    1    0
## SVM           0.8     0.9   1.00 0.9500000       1    1    0
## KNN           0.8     0.9   1.00 0.9633333       1    1    0
## RF            0.7     0.9   1.00 0.9333333       1    1    0

scales <- list(x=list(relation="free"), y=list(relation="free"))
bwplot(results,scales=scales)

densityplot(results, scales=scales, pch = "|")

dotplot(results, scales=scales)

parallelplot(results)

splom(results)

diffs <- diff(results)
summary(diffs)

## 
## Call:
## summary.diff.resamples(object = diffs)
## 
## p-value adjustment: bonferroni 
## Upper diagonal: estimates of the difference
## Lower diagonal: p-value for H0: difference = 0
## 
## Accuracy 
##              DecisionTree NaiveBayes SVM       KNN       RF       
## DecisionTree              -0.008889  -0.015556 -0.024444 -0.004444
## NaiveBayes   1.0000                  -0.006667 -0.015556  0.004444
## SVM          0.1687       0.8307               -0.008889  0.011111
## KNN          0.1366       0.6984     1.0000               0.020000
## RF           1.0000       1.0000     0.5731    0.1737             
## 
## Kappa 
##              DecisionTree NaiveBayes SVM       KNN       RF       
## DecisionTree              -0.013333  -0.023333 -0.036667 -0.006667
## NaiveBayes   1.0000                  -0.010000 -0.023333  0.006667
## SVM          0.1687       0.8307               -0.013333  0.016667
## KNN          0.1366       0.6984     1.0000               0.030000
## RF           1.0000       1.0000     0.5731    0.1737

Exercise 5

Jacob John

Datasets

Visualization

Inference: