SV classifier using non-linear kernel is called SVM.
there are two types of kernel we will use here.
polynomial (tuning parameter = degree) and radial (tuning parameter = gamma).
set.seed(1)
x = matrix(rnorm(200*2), ncol=2)
x[1:100,] = x[1:100,] + 2
x[101:150,] = x[101:150,] - 2
y = c(rep(1,150), rep(2,50))
dat = data.frame(x=x, y=as.factor(y))
plot(x, col=y, pch=19)
plotting the data makes it clear that the class boundary is indeed non-linear.
now we will randomly split the data into training and testing groups.
we then fit the training data using svm() function with a radial kernel and gamma=1:
library(e1071)
## Warning: package 'e1071' was built under R version 3.3.2
set.seed(12)
train = sample(200,100)
svmfit = svm(y~., data=dat[train,], kernel="radial", gamma=1,cost=1)
plot(svmfit, dat[train,])
summary(svmfit)
##
## Call:
## svm(formula = y ~ ., data = dat[train, ], kernel = "radial",
## gamma = 1, cost = 1)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 1
##
## Number of Support Vectors: 40
##
## ( 24 16 )
##
##
## Number of Classes: 2
##
## Levels:
## 1 2
train.error = mean(svmfit$fitted != dat[train,]$y)
train.error
## [1] 0.09
ypred = predict(svmfit, dat[-train,])
table(predict=ypred, truth=dat[-train,]$y)
## truth
## predict 1 2
## 1 70 6
## 2 5 19
test.error = mean(ypred != dat[-train,]$y)
test.error
## [1] 0.11
the plot shows that the resulting SVM has a decidedly non-linear boundary.
training error = 9% test error = 11%
we have a good amount of training error. we can increase the value of cost to reduce the training error but that will come at the cost of more irregular decision boundary that seems to be at risk of overfitting the data.
performing cross-validation to choose best value of cost and gamma.
set.seed(1)
tune.out = tune(svm, y~., data=dat[train,], kernel="radial",
ranges=list(cost=c(0.1, 1, 10, 100, 1000),
gamma=c(0.5, 1, 2, 3, 4)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 10 0.5
##
## - best performance: 0.12
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-01 0.5 0.25 0.17159384
## 2 1e+00 0.5 0.14 0.12649111
## 3 1e+01 0.5 0.12 0.09189366
## 4 1e+02 0.5 0.12 0.11352924
## 5 1e+03 0.5 0.20 0.16996732
## 6 1e-01 1.0 0.25 0.17159384
## 7 1e+00 1.0 0.12 0.09189366
## 8 1e+01 1.0 0.13 0.09486833
## 9 1e+02 1.0 0.19 0.14491377
## 10 1e+03 1.0 0.23 0.17029386
## 11 1e-01 2.0 0.26 0.17126977
## 12 1e+00 2.0 0.14 0.08432740
## 13 1e+01 2.0 0.16 0.12649111
## 14 1e+02 2.0 0.24 0.17126977
## 15 1e+03 2.0 0.23 0.17029386
## 16 1e-01 3.0 0.25 0.17159384
## 17 1e+00 3.0 0.14 0.10749677
## 18 1e+01 3.0 0.20 0.13333333
## 19 1e+02 3.0 0.24 0.17126977
## 20 1e+03 3.0 0.25 0.18408935
## 21 1e-01 4.0 0.25 0.17159384
## 22 1e+00 4.0 0.15 0.11785113
## 23 1e+01 4.0 0.22 0.14757296
## 24 1e+02 4.0 0.22 0.16865481
## 25 1e+03 4.0 0.25 0.18408935
Best values are:
cost = 10 gamma = 0.5
train.error = mean(tune.out$best.model$fitted != dat[train,]$y)
train.error
## [1] 0.09
plot(tune.out$best.model, data = dat[train,])
pred=predict(tune.out$best.model, newdata=dat[-train,])
table(true=dat[-train,"y"], pred=pred)
## pred
## true 1 2
## 1 70 5
## 2 5 20
test.error = mean(pred != dat[-train,]$y)
test.error
## [1] 0.1
training error = 9% test error = 10%