Support Vector Machines

SVM is one of the most popular data mining techniques. It is a supervised classification algorithm. SVM attempts to separate two classes by finding the dividing line that separates the classes in feature space. The kernel trick can be used to map multiple features into high dimensional space and find a separating hyperplane.

e1071

First, load library e1071 for the svm function. Legend has it that this was the room number where the algorithm was developed.

Then split the iris data into test (50 observations) and train (100 observations). By setting a seed we will get reproducible results.

library(e1071)
library(printr)

set.seed(1958)
train.indices <- sample(1:nrow(iris), 100)
train <- iris[train.indices, ]
test <- iris[-train.indices, ]

Build the model and test the results

The results are good, we have 96% accuracy.

model <- svm(Species ~ ., data=train)
results <- predict(object=model, newdata=test, type="class")

table(results, test$Species)

results/	setosa	versicolor	virginica
setosa	16	0	0
versicolor	0	15	2
virginica	0	0	17

sum(results==test$Species)/length(results) # accuracy

## [1] 0.96

Pros and cons

SVM is effective in high dimensional space. It can perform well even if the number of features is greater than the number of examples; however if the number of features is much greater, the performance may suffer.