Use the svm() algorithm of e1071 package to carry out the Support vector machine for the PlantGrowth dataset. Then discuss the number of support vector/samples.
library(e1071)
## Warning: package 'e1071' was built under R version 4.1.2
library(caret)
## Warning: package 'caret' was built under R version 4.1.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.1.2
## Loading required package: lattice
data("PlantGrowth")
str(PlantGrowth)
## 'data.frame': 30 obs. of 2 variables:
## $ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
## $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
PlantGrowth_svm <- svm(group ~ ., data = PlantGrowth)
summary(PlantGrowth_svm)
##
## Call:
## svm(formula = group ~ ., data = PlantGrowth)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 29
##
## ( 10 9 10 )
##
##
## Number of Classes: 3
##
## Levels:
## ctrl trt1 trt2
print(PlantGrowth_svm)
##
## Call:
## svm(formula = group ~ ., data = PlantGrowth)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 29
Based on the svm() algorithm there are 29 support vectors.
Do a similar svm() analysis as that in the previous question using the Iris dataset. Discuss the number of support vectors/samples.
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris_svm <- svm(Species ~., data = iris)
summary(iris_svm)
##
## Call:
## svm(formula = Species ~ ., data = iris)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 51
##
## ( 8 22 21 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
There are 51 support vectors.
Use the Iris dataset (or any other dataset) to select 80% of the samples for training svm(), then use the rest 20% for validation. Discuss your results.
set.seed(123)
indexes <- createDataPartition(iris$Species, p = .8, list = F)
train <- iris[indexes,]
test <- iris[-indexes,]
model_svm <- svm(Species~., data = train)
print(model_svm)
##
## Call:
## svm(formula = Species ~ ., data = train)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 45
56 data points are used as support vectors.
# predict test data with the fitted model
pred <- predict(model_svm, test)
print(pred)
## 1 2 6 16 23 34 35
## setosa setosa setosa setosa setosa setosa setosa
## 38 44 47 51 60 64 73
## setosa setosa setosa versicolor versicolor versicolor versicolor
## 74 87 91 94 95 97 104
## versicolor versicolor versicolor versicolor versicolor versicolor virginica
## 109 111 113 116 120 127 134
## virginica virginica virginica virginica versicolor virginica versicolor
## 138 147
## virginica virginica
## Levels: setosa versicolor virginica
table(pred, test$Species)
##
## pred setosa versicolor virginica
## setosa 10 0 0
## versicolor 0 10 2
## virginica 0 0 8
mean(pred==test$Species)
## [1] 0.9333333
We can spot that there are 2 data points which the actual species is Virginica, but the svm model predicts it as versicolor. In terms of accuracy this model scores 93% which is pretty good.