Model 3: Support Vector Machine (SVM)

Support Vector Machine Models are yet another classification model in supervised machine learning. SVM finds the hyperplane that maximizes the margin between the classes of the independent variable. svm is also part of the e1071 library, amongs other libraries.
The approach will be the same, we’ll run the model 1,000 times and then calculate the accuracy.
Again, we have to partition the data into a test and training sets. And we show another way to do this. The code below also shows how to choose the best parameters for the svm model. We do so by using the tune() function. Note that we obtain cost = 100 and gamma = 0.1. We’ll use those parameters for our model.

suppressPackageStartupMessages(library(e1071))

index <- 1:nrow(iris)
part_index <- sample(index, trunc(length(index)*80/100))
Train <- iris[part_index,]
Test <- iris[-part_index,]

#Choosing parameters
tuned <- tune.svm(Species~., data = Train, gamma = 10^(-6:-1), cost = 10^(1:2))
summary(tuned)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##   0.01   10
## 
## - best performance: 0.03333333 
## 
## - Detailed performance results:
##    gamma cost      error dispersion
## 1  1e-06   10 0.69166667 0.23586589
## 2  1e-05   10 0.69166667 0.23586589
## 3  1e-04   10 0.58333333 0.23895348
## 4  1e-03   10 0.10000000 0.05270463
## 5  1e-02   10 0.03333333 0.05826716
## 6  1e-01   10 0.04166667 0.05892557
## 7  1e-06  100 0.69166667 0.23586589
## 8  1e-05  100 0.58333333 0.23895348
## 9  1e-04  100 0.10000000 0.05270463
## 10 1e-03  100 0.03333333 0.05826716
## 11 1e-02  100 0.04166667 0.05892557
## 12 1e-01  100 0.05833333 0.06860605

After running the model __1,000_ times, we obtain that SVM predicts the correct Species 96.1% of the time. Quiet impressive, huh?

accuracy_df <- data.frame()

for (i in 1:1000){
  splits <- splitdf(iris, seed=4000+i, train_fraction = .8)
  
  Train <- as.data.frame(splits[1])
  Test <- as.data.frame(splits[2])
  
  names(Train) <- names(iris)
  names(Test) <- names(iris)
  
  m <- svm(Species ~ ., data = Train)
  p <- predict(m, Test[,-5])
  accuracy_df[i,1] <- sum(p==Test$Species)/length(p)
}

mean(accuracy_df[,1])
## [1] 0.9612667

To be continued…