In this project we will use Support Vector Machines on the iris dataset. SVMs are often used in classification tasks, and they also perform well in many other problems.

We will use the svm() function in e1071 library to run the SVM supervised learning algorithm.

library(e1071)

Load R’s iris dataset

data("iris")

Split the data into training and testing sets.

set.seed(1)  

Becasue we want 75% of data in training set, we create random index of 75% of the data

train = sample(1:nrow(iris), round(nrow(iris)*0.75)) 

We use the index to build training and testing sets

train_iris = iris[train, ]
test_iris = iris[-train,]

When we call the svm() function, we would classify the species response variable using the 4 other predictors Kernels are used to transform the data to a higher dimension so it can be seperated by hpyerplanes, here we use radial kernal in svm function. Arguments gamma and cost can be used to tune the operation of svm(), where gamma is used by the kernal function, and cost allows us to specify the cost of a violation to the margin. When cost is small, the margins will be wide, resulting in many support vectors.

svm1 = svm(Species ~., data = train_iris, kernel = 'radial', gamma = 0.1, cost = 10)

Let’s take a look at the fitted model

summary(svm1)
## 
## Call:
## svm(formula = Species ~ ., data = train_iris, kernel = "radial", 
##     gamma = 0.1, cost = 10)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  10 
## 
## Number of Support Vectors:  26
## 
##  ( 13 9 4 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

From the summary of the svm model, we see that the model found 26 support vectors: 13 for setosa, 9 for versicolor and 4 for virginica.We can use plot function of the svm function to visualize the support vectors, the decision boundary, and the margin for the model. We will use the Petal.width and Petal.Length predictors to visualize a two-dimensional projection of the data.

plot(svm1, train_iris, Petal.Width ~ Petal.Length, slice = list(Sepal.Width=3, Sepal.Length=4))

From the plot we notice that our fitted model has properly sperated the dataset.

Make Predictions

Now we use the predict function with the trained SVM model to make predictions using the test set.

pred = predict(svm1, test_iris)

Confusion Matrix

Next we use table function to create a confusion matrix to check the accuracy of the model.

confusion_matrix = table(test_iris$Species, pred)
confusion_matrix
##             pred
##              setosa versicolor virginica
##   setosa         13          0         0
##   versicolor      0         13         1
##   virginica       0          0        11

By the confusion matrix, we see that there is only 1 misclassification, and the test accuracy is (13+13+11)/38 = 0.97

Accuracy = (13+13+11)/38
Accuracy
## [1] 0.9736842