Complete all Exercises, and submit answers to VtopBeta
### load packages
library(caret)## Loading required package: lattice
## Loading required package: ggplot2
library(knitr)| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
ir_data=iris
set.seed(100)
head(ir_data)## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
intrain <- createDataPartition(y = ir_data$Species, p= 0.7, list = FALSE)
training<-iris[intrain,]
testing<-ir_data[-intrain,]
dim(training);dim(testing)## [1] 105 5
## [1] 45 5
summary(ir_data)## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
training[["Species"]] = factor(training[["Species"]])
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)The results of confusion matrix show that this time the accuracy on the test set is 95.56%.
set.seed(3233)
svm_Linear <- train(Species ~., data = training, method = "svmLinear",trControl=trctrl,preProcess = c("center", "scale"),tuneLength = 10)
svm_Linear## Support Vector Machines with Linear Kernel
##
## 105 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## Pre-processing: centered (4), scaled (4)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 94, 93, 95, 95, 94, 96, ...
## Resampling results:
##
## Accuracy Kappa
## 0.9589562 0.9381692
##
## Tuning parameter 'C' was held constant at a value of 1
test_pred <- predict(svm_Linear, newdata = testing)
test_pred## [1] setosa setosa setosa setosa setosa setosa
## [7] setosa setosa setosa setosa setosa setosa
## [13] setosa setosa setosa versicolor versicolor versicolor
## [19] versicolor versicolor versicolor versicolor virginica versicolor
## [25] versicolor versicolor versicolor virginica versicolor versicolor
## [31] virginica virginica virginica virginica virginica virginica
## [37] virginica virginica virginica virginica virginica virginica
## [43] virginica virginica virginica
## Levels: setosa versicolor virginica
confusionMatrix(test_pred, testing$Species )## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 13 0
## virginica 0 2 15
##
## Overall Statistics
##
## Accuracy : 0.9556
## 95% CI : (0.8485, 0.9946)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9333
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 0.8667 1.0000
## Specificity 1.0000 1.0000 0.9333
## Pos Pred Value 1.0000 1.0000 0.8824
## Neg Pred Value 1.0000 0.9375 1.0000
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.2889 0.3333
## Detection Prevalence 0.3333 0.2889 0.3778
## Balanced Accuracy 1.0000 0.9333 0.9667
grid <- expand.grid(C = c(0,0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2,5))
set.seed(3233)
svm_Linear_Grid <- train(Species ~ ., data = training, method = "svmLinear",trControl=trctrl,preProcess = c("center","scale"),tuneGrid=grid,tuneLength = 10)## Warning: model fit failed for Fold01.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold02.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold03.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold04.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold05.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold06.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold07.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold08.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold09.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold10.Rep1: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold01.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold02.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold03.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold04.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold05.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold06.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold07.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold08.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold09.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold10.Rep2: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold01.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold02.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold03.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold04.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold05.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold06.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold07.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold08.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold09.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning: model fit failed for Fold10.Rep3: C=0.00 Error in .local(x, ...) :
## No Support Vectors found. You may want to change your parameters
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
## trainInfo, : There were missing values in resampled performance measures.
## Warning in train.default(x, y, weights = w, ...): missing values found in
## aggregated results
svm_Linear_Grid## Support Vector Machines with Linear Kernel
##
## 105 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## Pre-processing: centered (4), scaled (4)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 94, 93, 95, 95, 94, 96, ...
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.00 NaN NaN
## 0.01 0.8820539 0.8228964
## 0.05 0.9375253 0.9060775
## 0.10 0.9612626 0.9416873
## 0.25 0.9589562 0.9381692
## 0.50 0.9626599 0.9437247
## 0.75 0.9519192 0.9276385
## 1.00 0.9589562 0.9381692
## 1.25 0.9619865 0.9428105
## 1.50 0.9619865 0.9428105
## 1.75 0.9619865 0.9428105
## 2.00 0.9619865 0.9428105
## 5.00 0.9717508 0.9575908
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was C = 5.
plot(svm_Linear_Grid)test_pred_grid <- predict(svm_Linear_Grid, newdata = testing)
test_pred_grid## [1] setosa setosa setosa setosa setosa setosa
## [7] setosa setosa setosa setosa setosa setosa
## [13] setosa setosa setosa versicolor versicolor versicolor
## [19] versicolor versicolor versicolor versicolor virginica versicolor
## [25] versicolor versicolor versicolor virginica versicolor versicolor
## [31] virginica virginica virginica virginica virginica virginica
## [37] virginica virginica virginica virginica virginica virginica
## [43] virginica virginica virginica
## Levels: setosa versicolor virginica
confusionMatrix(test_pred_grid, testing$Species )## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 13 0
## virginica 0 2 15
##
## Overall Statistics
##
## Accuracy : 0.9556
## 95% CI : (0.8485, 0.9946)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9333
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 0.8667 1.0000
## Specificity 1.0000 1.0000 0.9333
## Pos Pred Value 1.0000 1.0000 0.8824
## Neg Pred Value 1.0000 0.9375 1.0000
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.2889 0.3333
## Detection Prevalence 0.3333 0.2889 0.3778
## Balanced Accuracy 1.0000 0.9333 0.9667
library(randomForest)
model <- randomForest(Species ~., data = training)
pred <- predict(model, newdata = testing)
table(pred, testing$Species)##
## pred setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 13 0
## virginica 0 2 15
(15+14+15)/nrow(testing) #change this according to the diagonal element of the previous statement result ## [1] 0.9777778
plot(model)So 97.77778% accuracy is found
library(e1071)
model <- naiveBayes(Species ~., data = training)
class(model)## [1] "naiveBayes"
summary(model)## Length Class Mode
## apriori 3 table numeric
## tables 4 -none- list
## levels 3 -none- character
## call 4 -none- call
print(model)##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## setosa versicolor virginica
## 0.3333333 0.3333333 0.3333333
##
## Conditional probabilities:
## Sepal.Length
## Y [,1] [,2]
## setosa 5.071429 0.3409083
## versicolor 5.825714 0.4667427
## virginica 6.540000 0.6611932
##
## Sepal.Width
## Y [,1] [,2]
## setosa 3.517143 0.3416962
## versicolor 2.748571 0.2974118
## virginica 2.962857 0.3263756
##
## Petal.Length
## Y [,1] [,2]
## setosa 1.471429 0.1856173
## versicolor 4.182857 0.4712223
## virginica 5.525714 0.5653437
##
## Petal.Width
## Y [,1] [,2]
## setosa 0.2514286 0.1039554
## versicolor 1.3114286 0.1794951
## virginica 1.9885714 0.2857101
preds <- predict(model, newdata = training)
table(preds,training$Species)##
## preds setosa versicolor virginica
## setosa 35 0 0
## versicolor 0 33 3
## virginica 0 2 32
(35+33+32)/(35+33+2+32+3)#change this according to the diagonal element of the previous statement result ## [1] 0.952381
So 95.2381% accuracy is found by this method.
dtree_fit <- train(Species ~., data = training, method = "rpart",parms = list( split = "information"),trControl=trctrl,tuneLength = 10)
dtree_fit## CART
##
## 105 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 96, 96, 94, 93, 93, 94, ...
## Resampling results across tuning parameters:
##
## cp Accuracy Kappa
## 0.00000000 0.9432323 0.9135621
## 0.05555556 0.9432323 0.9135621
## 0.11111111 0.9432323 0.9135621
## 0.16666667 0.9432323 0.9135621
## 0.22222222 0.9432323 0.9135621
## 0.27777778 0.9432323 0.9135621
## 0.33333333 0.9432323 0.9135621
## 0.38888889 0.9432323 0.9135621
## 0.44444444 0.8632323 0.7992764
## 0.50000000 0.3920202 0.1285714
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.3888889.
library(rpart.plot)
library(RColorBrewer)
prp(dtree_fit$finalModel, box.palette = "Reds", tweak = 1.2)test_pred <- predict(dtree_fit, newdata = testing)
preds <- predict(model, newdata = training)
table(preds,training$Species)##
## preds setosa versicolor virginica
## setosa 35 0 0
## versicolor 0 33 3
## virginica 0 2 32
(35+33+32)/(33+35+2+3+32)#change this according to the diagonal element of the previous statement result## [1] 0.952381
95.2381% Accuracy was found in this method.
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Species ~., data = training, method = "knn",
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
knn_fit## k-Nearest Neighbors
##
## 105 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## Pre-processing: centered (4), scaled (4)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 95, 95, 93, 94, 93, 95, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 5 0.9602862 0.9400659
## 7 0.9512626 0.9263538
## 9 0.9401515 0.9096871
## 11 0.9425589 0.9133488
## 13 0.9483670 0.9220234
## 15 0.9434343 0.9145537
## 17 0.9337374 0.8999448
## 19 0.9236700 0.8848134
## 21 0.9239226 0.8853054
## 23 0.9183670 0.8768944
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.
plot(knn_fit)test_pred <- predict(knn_fit, newdata = testing)
test_pred## [1] setosa setosa setosa setosa setosa setosa
## [7] setosa setosa setosa setosa setosa setosa
## [13] versicolor setosa setosa versicolor versicolor versicolor
## [19] versicolor versicolor versicolor versicolor virginica versicolor
## [25] versicolor versicolor versicolor virginica versicolor versicolor
## [31] virginica virginica virginica virginica virginica virginica
## [37] virginica virginica virginica virginica virginica virginica
## [43] virginica virginica virginica
## Levels: setosa versicolor virginica
confusionMatrix(test_pred, testing$Species)## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 14 0 0
## versicolor 1 13 0
## virginica 0 2 15
##
## Overall Statistics
##
## Accuracy : 0.9333
## 95% CI : (0.8173, 0.986)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 0.9333 0.8667 1.0000
## Specificity 1.0000 0.9667 0.9333
## Pos Pred Value 1.0000 0.9286 0.8824
## Neg Pred Value 0.9677 0.9355 1.0000
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3111 0.2889 0.3333
## Detection Prevalence 0.3111 0.3111 0.3778
## Balanced Accuracy 0.9667 0.9167 0.9667
So 97.78% Accuracy was found using this method.
So according to accuracy results “KNN and Random Forest” performs the best on this dataset.