Module7_Devin

Exercise 1

Use the svm() algorithm of the e1071 package to carry out the support vector machine for the PlantGrowth data set. Then, discuss the number of support vectors/samples.

data(PlantGrowth)
x <- PlantGrowth$weight
y <- PlantGrowth$group

model_svm <- svm(group ~ ., data = PlantGrowth)
summary(model_svm)

## 
## Call:
## svm(formula = group ~ ., data = PlantGrowth)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  29
## 
##  ( 10 9 10 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  ctrl trt1 trt2

There are 29 support vectors so that means there are 29 points close to the hyperplane which are influencing the decision of where it goes. There are 3 classes. There are only 30 data points in PlantGrowth so all of the data points impact the hyperplane placement.

Exercise 2

Do a similar SVM analysis as that in the previous question using the iris data set. Discuss the number of support vectors/samples.

data(iris)
x <- iris[1:4]
y <- iris[5]

iris_svm <- svm(Species ~ ., data = iris)
summary(iris_svm)

## 
## Call:
## svm(formula = Species ~ ., data = iris)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  51
## 
##  ( 8 22 21 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

There are 51 support vectors here out of 150 data points. So much less of the data are support vectors in iris vs PlantGrowth. This also has 3 classes.

Exercise 3

Use the iris data set (or any other data set) to select 80% of the samples for training svm(), then use the rest 20% for validation. Discuss your results.

set.seed(123)
samples <- sample(nrow(iris), nrow(iris)*0.80)
train <- iris[samples,]
test <- iris[-samples,]

iris_svm_train <- svm(Species ~., data = train)
summary(iris_svm_train)

## 
## Call:
## svm(formula = Species ~ ., data = train)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  46
## 
##  ( 8 19 19 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

pred_test <- predict(iris_svm_train,test)

table(pred_test,test$Species)

##             
## pred_test    setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0         14         0
##   virginica       0          1         5

tp <- sum(pred_test == test$Species)/length(test$Species)
print(paste("The true positive rate is",tp))

## [1] "The true positive rate is 0.966666666666667"

Here we see that 1/30 or 3.3% of the data was incorrectly classified using the test data. The training model had 46 support vectors.

Resources

https://data-flair.training/blogs/e1071-in-r/

Module7_Devin_Teran

Devin Teran