Support Vector Classifier

we will begin with randomly generating normal numbers which belongs to two classes.

and then checking whether the classes are linearly separable.(they are not)

set.seed(1)

x = matrix(rnorm(20*2), ncol=2)

y = c(rep(-1,10), rep(1,10))

x[y==1,] = x[y==1,] + 1

plot(x, col=(3-y))

next, we fit the support vector classifier.

for svm() function to perform, we first must encode the response as a factor variable.

dat = data.frame(x=x, y=as.factor(y))

library(e1071)

## Warning: package 'e1071' was built under R version 3.3.2

# fitting the model:

svmfit = svm(y ~ ., data=dat, kernel="linear", cost=10, scale=FALSE)

plot(svmfit, dat)

cost argument allows us to specify the cost of the allowed violation to the margin.

when cost argument is small, then the margins will be wide and many support vectors will be on the margin or will violate the margin. High Bias-Low Variance.

when cost argument is large, then the margins will be narrow and few support vectors will be on the margin or will violate the margin. Low Bias-High Variance.

support vectors (observations on margin or wrong side of the margin) are plotted as crosses. these are the observation which affects the support vector classifier.

# which observatiosn are support vectors
svmfit$index

## [1]  1  2  5  7 14 16 17

summary(svmfit)

## 
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 10, 
##     scale = FALSE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  10 
##       gamma:  0.5 
## 
## Number of Support Vectors:  7
## 
##  ( 4 3 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  -1 1

kernel used here is Linear with cost=10.

there are 7 support vectors. 4 in one class. 3 in the other class.

let’s try to use smaller cost value.

# fitting the model:

svmfit = svm(y ~ ., data=dat, kernel="linear", cost=0.1, scale=FALSE)

plot(svmfit, dat)

svmfit$index

##  [1]  1  2  3  4  5  7  9 10 12 13 14 15 16 17 18 20

summary(svmfit)

## 
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 0.1, 
##     scale = FALSE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.1 
##       gamma:  0.5 
## 
## Number of Support Vectors:  16
## 
##  ( 8 8 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  -1 1

with lower cost now we have more (16) support vectors. 8 and 8 in both classes.

with more support vectors influencing the classifier now we will have high bias but low variance.

unfortunately svm() function does not output the coefficients of the linear decision boundary nor does it output the width of the margin.

e1071 library includes a built-in function tune() to perform cross-validation. by default it does 10-fold cv.

the following command indicate that we want to compare SVMs with a linear kernel using a range of values of the cost parameter.

set.seed(1)

tune.out = tune(svm, y~., data=dat, kernel="linear",
                range=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10, 100)))

summary(tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.1
## 
## - best performance: 0.1 
## 
## - Detailed performance results:
##    cost error dispersion
## 1 1e-03  0.70  0.4216370
## 2 1e-02  0.70  0.4216370
## 3 1e-01  0.10  0.2108185
## 4 1e+00  0.15  0.2415229
## 5 5e+00  0.15  0.2415229
## 6 1e+01  0.15  0.2415229
## 7 1e+02  0.15  0.2415229

cost=0.1 results in the lowest cross-validation error rate.

the tune() function stores the best model obtained which can be accessed as follows.

bestmodel = tune.out$best.model

summary(bestmodel)

## 
## Call:
## best.tune(method = svm, train.x = y ~ ., data = dat, ranges = list(cost = c(0.001, 
##     0.01, 0.1, 1, 5, 10, 100)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.1 
##       gamma:  0.5 
## 
## Number of Support Vectors:  16
## 
##  ( 8 8 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  -1 1

now we will generate test data.

xtest = matrix(rnorm(20*2), ncol=2)
ytest = sample(c(-1,1), 20, rep=TRUE)
xtest[ytest==1,] = xtest[ytest==1,] + 1

testdat = data.frame(x=xtest, y=as.factor(ytest))

now we will predict().

ypred = predict(bestmodel, testdat)

table(predict=ypred, truth=testdat$y)

##        truth
## predict -1  1
##      -1 11  1
##      1   0  8

19 of the test observations are correctly classified. 1 misclassified.

what if had instead used cost=0.01?

# low cost = wide margin = high bias/low variance

svmfit = svm(y ~ ., data=dat, kernel="linear", cost=0.01, scale=FALSE)

summary(svmfit)

## 
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 0.01, 
##     scale = FALSE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
##       gamma:  0.5 
## 
## Number of Support Vectors:  20
## 
##  ( 10 10 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  -1 1

plot(svmfit, dat)

table(svmfit$fitted, dat$y)

##     
##      -1  1
##   -1 10  5
##   1   0  5

train.error = mean(svmfit$fitted != dat$y)
train.error

## [1] 0.25

ypred = predict(svmfit, testdat)

table(predict=ypred, truth=testdat$y)

##        truth
## predict -1  1
##      -1 11  2
##      1   0  7

test.error = mean(ypred != testdat$y)
test.error

## [1] 0.1

2 misclassified observations.

25% training error but 10% test error. high bias-low variance due to low cost/wide margin. no overfitting, but more of an underfitting.

now consider a situation in which the two classes are linearly separable.

then we can find a separating hyper plane using the svm() function. we first further separate the two classes in our simulated data so that they are linearly separable.

x[y==1,] = x[y==1,] + 0.5

plot(x, col=(y+5)/2, pch=19)

now the observations are barely linearly separable.

let’s try the model with large value of cost (narrow margin, low bias, high variance).

# high cost = narrow margin = low bias/high variance

dat = data.frame(x=x, y=as.factor(y))

svmfit = svm(y~., data=dat, kernel="linear", cost=1e5)

summary(svmfit)

## 
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 1e+05)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1e+05 
##       gamma:  0.5 
## 
## Number of Support Vectors:  3
## 
##  ( 1 2 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  -1 1

plot(svmfit, dat)

train.error = mean(svmfit$fitted != dat$y)
train.error

## [1] 0

ypred = predict(svmfit, testdat)

table(predict=ypred, truth=testdat$y)

##        truth
## predict -1  1
##      -1 11  2
##      1   0  7

test.error = mean(ypred != testdat$y)
test.error

## [1] 0.1

0% training error. but 10% test error. (overfitting due to narrow margin, low bias, high variance due to high cost)

Support Vector Classifier

Maulik Patel

December 30, 2016