svm() algorithm of the e1071 package to carry out the support vector machine for the PlantGrowth data set. Then discuss the number of support vectors/samples. [Install the e1071 package in R if needed.]p <- PlantGrowth
cbind(p[1:10,],p[11:20,],p[21:30,])
p_svm <- svm(group ~ weight, data = p)
summary(p_svm)
##
## Call:
## svm(formula = group ~ weight, data = p)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 29
##
## ( 10 9 10 )
##
##
## Number of Classes: 3
##
## Levels:
## ctrl trt1 trt2
p_pred <- predict(p_svm, newdata = p)
table(p_pred)
## p_pred
## ctrl trt1 trt2
## 3 10 17
confusionMatrix(p_pred, p$group)
## Confusion Matrix and Statistics
##
## Reference
## Prediction ctrl trt1 trt2
## ctrl 0 2 1
## trt1 4 6 0
## trt2 6 2 9
##
## Overall Statistics
##
## Accuracy : 0.5
## 95% CI : (0.313, 0.687)
## No Information Rate : 0.333
## P-Value [Acc > NIR] : 0.0435
##
## Kappa : 0.25
##
## Mcnemar's Test P-Value : 0.1006
##
## Statistics by Class:
##
## Class: ctrl Class: trt1 Class: trt2
## Sensitivity 0.000 0.600 0.900
## Specificity 0.850 0.800 0.600
## Pos Pred Value 0.000 0.600 0.529
## Neg Pred Value 0.630 0.800 0.923
## Prevalence 0.333 0.333 0.333
## Detection Rate 0.000 0.200 0.300
## Detection Prevalence 0.100 0.333 0.567
## Balanced Accuracy 0.425 0.700 0.750
iris data set. Discuss the number of support vectrs/samples.i <- iris
i
i_svm <- svm(Species ~ ., data = i)
summary(i_svm)
##
## Call:
## svm(formula = Species ~ ., data = i)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 51
##
## ( 8 22 21 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
i_pred <- predict(i_svm, newdata = i)
table(i_pred)
## i_pred
## setosa versicolor virginica
## 50 50 50
confusionMatrix(i_pred, i$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 2
## virginica 0 2 48
##
## Overall Statistics
##
## Accuracy : 0.973
## 95% CI : (0.933, 0.993)
## No Information Rate : 0.333
## P-Value [Acc > NIR] : <0.0000000000000002
##
## Kappa : 0.96
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.000 0.960 0.960
## Specificity 1.000 0.980 0.980
## Pos Pred Value 1.000 0.960 0.960
## Neg Pred Value 1.000 0.980 0.980
## Prevalence 0.333 0.333 0.333
## Detection Rate 0.333 0.320 0.320
## Detection Prevalence 0.333 0.333 0.333
## Balanced Accuracy 1.000 0.970 0.970
iris data set produced a SVM model with 51 support vectors for 150 samples. This means about 1 third of the samples are used as support vectors indicating that our model might be overfitting with an accuracy of over 97%.iris data set (or any other data set) to select 80% of the samples for training svm(), then use the rest 20% for validation. Discuss your results.set.seed(42)
index <- createDataPartition(iris$Species, p = 0.80, list = FALSE)
train <- iris[index, ]
test <- iris[-index, ]
svm_iris <- svm(Species ~ ., data = train)
summary(svm_iris)
##
## Call:
## svm(formula = Species ~ ., data = train)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 44
##
## ( 6 19 19 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
pred_iris_train <- predict(svm_iris, newdata = train)
table(pred_iris_train)
## pred_iris_train
## setosa versicolor virginica
## 40 40 40
confusionMatrix(pred_iris_train, train$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 40 0 0
## versicolor 0 39 1
## virginica 0 1 39
##
## Overall Statistics
##
## Accuracy : 0.983
## 95% CI : (0.941, 0.998)
## No Information Rate : 0.333
## P-Value [Acc > NIR] : <0.0000000000000002
##
## Kappa : 0.975
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.000 0.975 0.975
## Specificity 1.000 0.988 0.988
## Pos Pred Value 1.000 0.975 0.975
## Neg Pred Value 1.000 0.988 0.988
## Prevalence 0.333 0.333 0.333
## Detection Rate 0.333 0.325 0.325
## Detection Prevalence 0.333 0.333 0.333
## Balanced Accuracy 1.000 0.981 0.981
pred_iris_test <- predict(svm_iris, newdata = test)
table(pred_iris_test)
## pred_iris_test
## setosa versicolor virginica
## 9 12 9
confusionMatrix(pred_iris_test, test$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 9 0 0
## versicolor 1 9 2
## virginica 0 1 8
##
## Overall Statistics
##
## Accuracy : 0.867
## 95% CI : (0.693, 0.962)
## No Information Rate : 0.333
## P-Value [Acc > NIR] : 0.0000000023
##
## Kappa : 0.8
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 0.900 0.900 0.800
## Specificity 1.000 0.850 0.950
## Pos Pred Value 1.000 0.750 0.889
## Neg Pred Value 0.952 0.944 0.905
## Prevalence 0.333 0.333 0.333
## Detection Rate 0.300 0.300 0.267
## Detection Prevalence 0.300 0.400 0.300
## Balanced Accuracy 0.950 0.875 0.875
iris data used 44 data points as support vectors, a bit over 1 third of the 120 samples. The accuracy on the training data is over 98%, however it drops to under 87% on the test data indicating that it may be overfit.