EX1 Use the svm() algorithm of the e1071 package to carry out the support vector machine for the PlantGrowth data set. Then, discuss the number of support vectors.

library(e1071)
## Warning: package 'e1071' was built under R version 4.2.2
svm_model <- svm(formula = group ~ ., data = PlantGrowth)

num <- nrow(svm_model$SV)
num
## [1] 29

There are 29 support vectors indicating that almost every data point must be used for creating the boundary. Signifying that maybe the data is difficult to separate and may be overfit.

Ex2 Do a similar SVM analysis as that in the previous question using the iris data set. Discuss the number of support vectors/samples.

svm_model <- svm(Species ~ ., data = iris)

num_support_vectors <- nrow(svm_model$SV)
num_support_vectors
## [1] 51

There are considerable less support vectors compared to PlantGrowth. Considering that there are only three species, the algorithm can correctly classify the data considering there are a third of the number of support vectors to rows.

Ex3 Use the iris data set to select 80% of the samples for training svm(), then use the rest 20% for validation. DIscuss your results.

train_indices <- sample(1:nrow(iris), 0.8 * nrow(iris))
iris_train <- iris[train_indices, ]
iris_test <- iris[-train_indices, ]

svm_model <- svm(Species ~ ., data = iris_train)

svm_predictions <- predict(svm_model, newdata = iris_test)

svm_accuracy <- sum(svm_predictions == iris_test$Species) / nrow(iris_test)
svm_accuracy
## [1] 0.9666667

It seems that the algorithm has worked and it was able to classify the dataset properly with a 96% accuracy.