SVM is one of the most popular data mining techniques. It is a supervised classification algorithm. SVM attempts to separate two classes by finding the dividing line that separates the classes in feature space. The kernel trick can be used to map multiple features into high dimensional space and find a separating hyperplane.
First, load library e1071 for the svm function. Legend has it that this was the room number where the algorithm was developed.
Then split the iris data into test (50 observations) and train (100 observations). By setting a seed we will get reproducible results.
library(e1071)
library(printr)
set.seed(1958)
train.indices <- sample(1:nrow(iris), 100)
train <- iris[train.indices, ]
test <- iris[-train.indices, ]
The results are good, we have 96% accuracy.
model <- svm(Species ~ ., data=train)
results <- predict(object=model, newdata=test, type="class")
table(results, test$Species)
| results/ | setosa | versicolor | virginica |
|---|---|---|---|
| setosa | 16 | 0 | 0 |
| versicolor | 0 | 15 | 2 |
| virginica | 0 | 0 | 17 |
sum(results==test$Species)/length(results) # accuracy
## [1] 0.96
SVM is effective in high dimensional space. It can perform well even if the number of features is greater than the number of examples; however if the number of features is much greater, the performance may suffer.