Support Vector Machine Example

I will use BBBC data to show how to tune and estimate support vector machine (SVM).

In the following code, I will get BBBC into R. I will then declare the dependent variable, Choice, to be a factor variable so that SVM doesn’t treat it as a numeric variable. If I don’t do it, SVM will run regression rather than classification.

library(e1071)
library(readxl)

## Warning: package 'readxl' was built under R version 3.2.4

bbbc <- read_excel("/Volumes/Transcend/Dropbox/Work/Teaching/DA 6813/Course Documents/Data/Bookbinders/BBBC-Train.xlsx")

bbbc$Choice <- factor(bbbc$Choice)

Next I split BBBC into 80% training data and 20% validation data.

set.seed(200)
tr_ind <- sample(nrow(bbbc),0.8*nrow(bbbc),replace = F)

bbtrain <- bbbc[tr_ind,]
bbtest <- bbbc[-tr_ind,]

The syntax for SVM from the package e1071 is as follows for the default kernel, which is radial basis function (RBF)

svm(formula = , data = , gamma =, cost =)

Let’s generate a formula. I will simply use the same model we used in estimating the logistic regression

model1 <- Choice ~ as.factor(Gender) + Amount_purchased +
  Frequency + Last_purchase + First_purchase + P_Child +
  P_Youth + P_Cook + P_DIY + P_Art

Next we need gamma and cost in case we decide to use the RBF. For this we can use some starting point using tune.svm function from the package e1071. Be careful about the different values you use in the seq() function as depending on your system it is likely to take a long time to comlete executing the function.

I will use gamma ranging from 0.01 to 0.1 in the increments of 0.01. This will give us 10 values of gamma in the following sequence {0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.10}

I will also increase cost from 0.1 to 1 in increments of 0.1. This will give us 10 values for cost as follows {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}

Thus, tune.svm will learn SMVs for 10X10 = 100 possible combinations of gamma and cost. Internally, tune.svm also uses a 10-fold cross validation to get classification error. Thus, for 100 combinations it will actually learn 10x100 = 1000 SVMs. Therefore, it takes a long time to execute this function.

I am going to assign the output from tune.svm to an object called tune

tune <- tune.svm(model1, data = bbtrain, gamma =seq(.01, 0.1, by = .01), cost = seq(0.1,1, by = 0.1))

“tune”" is a large list. It has a lot of output but at this point we are interested in knowing which parameter values for gamma and cost are the best. Note the best is within the 100 combinations that we decided to use for tuning. It doesn’t actually tell us the best parameter set globally.

tune$best.parameters

##    gamma cost
## 81  0.01  0.9

For this example and the range of tuning parameters, I got the best value of gamma = 0.01 and the best value of cost = 0.9.

In case you would like to see the errors for different combinations you can take a look at the following dataset

tune$performances

##     gamma cost     error dispersion
## 1    0.01  0.1 0.2484375 0.04699060
## 2    0.02  0.1 0.2484375 0.04699060
## 3    0.03  0.1 0.2484375 0.04699060
## 4    0.04  0.1 0.2476562 0.04731420
## 5    0.05  0.1 0.2476562 0.04731420
## 6    0.06  0.1 0.2476562 0.04731420
## 7    0.07  0.1 0.2476562 0.04731420
## 8    0.08  0.1 0.2484375 0.04699060
## 9    0.09  0.1 0.2484375 0.04699060
## 10   0.10  0.1 0.2484375 0.04699060
## 11   0.01  0.2 0.2484375 0.04699060
## 12   0.02  0.2 0.2414062 0.05082463
## 13   0.03  0.2 0.2328125 0.05257580
## 14   0.04  0.2 0.2273438 0.05355347
## 15   0.05  0.2 0.2281250 0.05321683
## 16   0.06  0.2 0.2281250 0.05582886
## 17   0.07  0.2 0.2273438 0.05329960
## 18   0.08  0.2 0.2257812 0.05329960
## 19   0.09  0.2 0.2257812 0.05175024
## 20   0.10  0.2 0.2265625 0.05103104
## 21   0.01  0.3 0.2460938 0.05039584
## 22   0.02  0.3 0.2281250 0.05296135
## 23   0.03  0.3 0.2234375 0.05563417
## 24   0.04  0.3 0.2195313 0.05329960
## 25   0.05  0.3 0.2164062 0.05400743
## 26   0.06  0.3 0.2140625 0.04943803
## 27   0.07  0.3 0.2171875 0.04896941
## 28   0.08  0.3 0.2179687 0.04878210
## 29   0.09  0.3 0.2195313 0.05042275
## 30   0.10  0.3 0.2203125 0.04952027
## 31   0.01  0.4 0.2343750 0.05362307
## 32   0.02  0.4 0.2210938 0.05706038
## 33   0.03  0.4 0.2164062 0.05234958
## 34   0.04  0.4 0.2132813 0.05182881
## 35   0.05  0.4 0.2156250 0.05145454
## 36   0.06  0.4 0.2171875 0.05244665
## 37   0.07  0.4 0.2179687 0.05278820
## 38   0.08  0.4 0.2218750 0.05025434
## 39   0.09  0.4 0.2210938 0.04969116
## 40   0.10  0.4 0.2179687 0.05028807
## 41   0.01  0.5 0.2273438 0.05317221
## 42   0.02  0.5 0.2179687 0.05455714
## 43   0.03  0.5 0.2148438 0.05301894
## 44   0.04  0.5 0.2156250 0.05079127
## 45   0.05  0.5 0.2171875 0.04938313
## 46   0.06  0.5 0.2187500 0.05036219
## 47   0.07  0.5 0.2195313 0.05055706
## 48   0.08  0.5 0.2195313 0.05148748
## 49   0.09  0.5 0.2195313 0.05148748
## 50   0.10  0.5 0.2203125 0.05006506
## 51   0.01  0.6 0.2234375 0.05440153
## 52   0.02  0.6 0.2171875 0.05296135
## 53   0.03  0.6 0.2156250 0.05011922
## 54   0.04  0.6 0.2156250 0.05011922
## 55   0.05  0.6 0.2164062 0.05182881
## 56   0.06  0.6 0.2187500 0.05063079
## 57   0.07  0.6 0.2195313 0.05148748
## 58   0.08  0.6 0.2203125 0.05244665
## 59   0.09  0.6 0.2203125 0.05244665
## 60   0.10  0.6 0.2195313 0.05253063
## 61   0.01  0.7 0.2187500 0.05834728
## 62   0.02  0.7 0.2148438 0.05224584
## 63   0.03  0.7 0.2164062 0.04858708
## 64   0.04  0.7 0.2179687 0.05161902
## 65   0.05  0.7 0.2187500 0.04968433
## 66   0.06  0.7 0.2195313 0.04947231
## 67   0.07  0.7 0.2203125 0.05153356
## 68   0.08  0.7 0.2203125 0.05244665
## 69   0.09  0.7 0.2210938 0.05350279
## 70   0.10  0.7 0.2210938 0.05299335
## 71   0.01  0.8 0.2171875 0.05878730
## 72   0.02  0.8 0.2140625 0.05339494
## 73   0.03  0.8 0.2171875 0.04952027
## 74   0.04  0.8 0.2164062 0.05103768
## 75   0.05  0.8 0.2187500 0.04968433
## 76   0.06  0.8 0.2187500 0.05063079
## 77   0.07  0.8 0.2203125 0.05113724
## 78   0.08  0.8 0.2218750 0.05314032
## 79   0.09  0.8 0.2234375 0.05377462
## 80   0.10  0.8 0.2226562 0.05428298
## 81   0.01  0.9 0.2117188 0.05492878
## 82   0.02  0.9 0.2156250 0.04984786
## 83   0.03  0.9 0.2171875 0.05113724
## 84   0.04  0.9 0.2164062 0.05103768
## 85   0.05  0.9 0.2179687 0.04919739
## 86   0.06  0.9 0.2187500 0.05063079
## 87   0.07  0.9 0.2203125 0.05205729
## 88   0.08  0.9 0.2234375 0.05377462
## 89   0.09  0.9 0.2234375 0.05352180
## 90   0.10  0.9 0.2242187 0.05337589
## 91   0.01  1.0 0.2117188 0.05492878
## 92   0.02  1.0 0.2148438 0.04971845
## 93   0.03  1.0 0.2171875 0.05113724
## 94   0.04  1.0 0.2171875 0.04952027
## 95   0.05  1.0 0.2179687 0.05095789
## 96   0.06  1.0 0.2187500 0.05022735
## 97   0.07  1.0 0.2234375 0.05339494
## 98   0.08  1.0 0.2234375 0.05352180
## 99   0.09  1.0 0.2234375 0.05352180
## 100  0.10  1.0 0.2234375 0.05489791

You can clearly see that for my values of gamma = 0.01 and cost = 0.9 the error is indeed the lowest at 0.2117188.

Finally, using the values of the best parameters, let’s run SVM. Note that as I am using RBF, the parameter of SVM aren’t useful at all. Instead the prediction will be done using support vectors. Thus, it’s the observations and not the parameters associated with the variables that are used in prediction. This is quite a departure from the traditional parametric methods such as linear and logistic regression.

I am assigning the output of SVM to an object called mysvm. I am also printing the summary of mysvm which will give some basic information. Take a note of the number of support vectors.

Also note that rather than manually putting the best value of gamma and cost, I am simply using the values from the “tune” object. In case you are wondering how I found it out, go to the top right window in RStudio (Environment tab) and navigate to “tune” object. Click on the blue icon next to it to expand “tune” and you will see where I got the names of the best parameters.

mysvm <- svm(formula=model1, data = bbtrain, gamma = tune$best.parameters$gamma, cost = tune$best.parameters$cost)
summary(mysvm)

## 
## Call:
## svm(formula = model1, data = bbtrain, gamma = tune$best.parameters$gamma, 
##     cost = tune$best.parameters$cost)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.9 
##       gamma:  0.01 
## 
## Number of Support Vectors:  633
## 
##  ( 321 312 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  0 1

Finally, using this model, let’s classify our test data and get the confusion matrix.

svmpredict <- predict(mysvm,bbtest, type = "response")

# Cross tabs or confusion matrix

table(pred=svmpredict,true=bbtest$Choice)

##     true
## pred   0   1
##    0 237  71
##    1   1  11

As an exercise you should compare this output to logistic regression and see which model performs better. You can also tweak SVM by using different kernels (linear, polynomial, or Sigmoid). Note that the parameters for these kernels will be different from RBF. You will find this in the documentation for e1071: https://cran.r-project.org/web/packages/e1071/e1071.pdf

Support Vector Machine Example

Ashwin Malshe

20 September 2016