EX.1

Use the svm() algorithm of e1071 package to carry out the Support vector machine for the PlantGrowth dataset. Then discuss the number of support vector/samples.

library(e1071)
## Warning: package 'e1071' was built under R version 4.1.2
library(caret)
## Warning: package 'caret' was built under R version 4.1.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.1.2
## Loading required package: lattice
data("PlantGrowth")
str(PlantGrowth)
## 'data.frame':    30 obs. of  2 variables:
##  $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
##  $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
PlantGrowth_svm <- svm(group ~ ., data = PlantGrowth)
summary(PlantGrowth_svm)
## 
## Call:
## svm(formula = group ~ ., data = PlantGrowth)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  29
## 
##  ( 10 9 10 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  ctrl trt1 trt2
print(PlantGrowth_svm)
## 
## Call:
## svm(formula = group ~ ., data = PlantGrowth)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  29

Based on the svm() algorithm there are 29 support vectors.

Ex.2

Do a similar svm() analysis as that in the previous question using the Iris dataset. Discuss the number of support vectors/samples.

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris_svm <- svm(Species ~., data = iris)
summary(iris_svm)
## 
## Call:
## svm(formula = Species ~ ., data = iris)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  51
## 
##  ( 8 22 21 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

There are 51 support vectors.

Ex.3

Use the Iris dataset (or any other dataset) to select 80% of the samples for training svm(), then use the rest 20% for validation. Discuss your results.

set.seed(123)
indexes <- createDataPartition(iris$Species, p = .8, list = F)
train <- iris[indexes,]
test <- iris[-indexes,]
model_svm <- svm(Species~., data = train)
print(model_svm)
## 
## Call:
## svm(formula = Species ~ ., data = train)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  45

56 data points are used as support vectors.

# predict test data with the fitted model
pred <- predict(model_svm, test)
print(pred)
##          1          2          6         16         23         34         35 
##     setosa     setosa     setosa     setosa     setosa     setosa     setosa 
##         38         44         47         51         60         64         73 
##     setosa     setosa     setosa versicolor versicolor versicolor versicolor 
##         74         87         91         94         95         97        104 
## versicolor versicolor versicolor versicolor versicolor versicolor  virginica 
##        109        111        113        116        120        127        134 
##  virginica  virginica  virginica  virginica versicolor  virginica versicolor 
##        138        147 
##  virginica  virginica 
## Levels: setosa versicolor virginica
table(pred, test$Species)
##             
## pred         setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0         10         2
##   virginica       0          0         8
mean(pred==test$Species)
## [1] 0.9333333

We can spot that there are 2 data points which the actual species is Virginica, but the svm model predicts it as versicolor. In terms of accuracy this model scores 93% which is pretty good.