Orange and Juice data for SVM

We will use the Orange juice data from ISLR packages, since the data has 1070 observations, we will randomize the data first and use 870 as training set and 200 as test set.

library(ISLR)
## Warning: package 'ISLR' was built under R version 3.4.2
attach(OJ)
set.seed(123)
oj_rand<- OJ[order(runif(1070)),]
train <-sample(nrow(oj_rand),800)
oj_train<-oj_rand[train, ]
oj_test<-oj_rand[-train,]

Tree

We will apply the Support Vector Classifier to the training data using cost=0.01, with Purchase as response and the other variables as predictors. We will show the difference in linear kernel, radial kernel and polynomial kernel

1. Linear Kernel

library(e1071)
## Warning: package 'e1071' was built under R version 3.4.2
ojsvmlin<- svm(Purchase ~ ., data=oj_train, kernel="linear", cost=0.01)
summary(ojsvmlin)
## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "linear", 
##     cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  440
## 
##  ( 220 220 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

The support vector classifier creates 440 support vector out of 800 training set . 220 belongs to Minute Main(MM) , and 220 belongs to Citrus Hill(CH).

Training and test set error in linear kernel

training
train_pred <- predict(ojsvmlin, oj_train)
table(oj_train$Purchase, train_pred)
##     train_pred
##       CH  MM
##   CH 437  53
##   MM  80 230
(80+53)/(437+53+80+230)
## [1] 0.16625
test
test_pred <- predict(ojsvmlin, oj_test)
table(oj_test$Purchase, test_pred)
##     test_pred
##       CH  MM
##   CH 145  18
##   MM  29  78
(18+29)/(145+18+29+78)
## [1] 0.1740741

we can see the training error is 0.166 and test error is 0.174, which are pretty close.

Use the tune() function to select an optimal cost (0.01 to 10)

tune.out <- tune(svm, Purchase ~. , data=oj_train, kernel="linear", ranges=list(cost=10^seq(-2,1,by=0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.1
## 
## - best performance: 0.17375 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.17875 0.04566256
## 2   0.01778279 0.17625 0.04348132
## 3   0.03162278 0.17625 0.04505013
## 4   0.05623413 0.17500 0.04526159
## 5   0.10000000 0.17375 0.04543387
## 6   0.17782794 0.18000 0.04721405
## 7   0.31622777 0.18000 0.04297932
## 8   0.56234133 0.18000 0.04571956
## 9   1.00000000 0.18000 0.04297932
## 10  1.77827941 0.17875 0.04715886
## 11  3.16227766 0.18000 0.04377975
## 12  5.62341325 0.18250 0.04794383
## 13 10.00000000 0.18125 0.04903584

we can see the optimal situation is when cost=0.1 we have error=0.1737

2.Use the radial kernel.

When we apply the radial kernel, the default value is gamma

set.seed(234)
svm_radial <-svm(Purchase ~. , data=oj_train, kernel="radial")
summary(svm_radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  373
## 
##  ( 186 187 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Compare the error rate with radial kernel, training and test set

training set
train_rad_pre <- predict(svm_radial, oj_train)
table(oj_train$Purchase, train_rad_pre)
##     train_rad_pre
##       CH  MM
##   CH 450  40
##   MM  81 229
(40+81)/(490+310)
## [1] 0.15125
test set
test_rad_pre <- predict(svm_radial, oj_test)
table(oj_test$Purchase, test_rad_pre)
##     test_rad_pre
##       CH  MM
##   CH 143  20
##   MM  32  75
(52)/(175+95)
## [1] 0.1925926

the radial result, the test set error increase to 0.19 where train set is 0.15

3.Polynomial Kernel

svm_poly <-svm(Purchase~ ., kernel="polynomial" , data=oj_train, degree=2)
summary(svm_poly)
## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "polynomial", 
##     degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  2 
##       gamma:  0.05555556 
##      coef.0:  0 
## 
## Number of Support Vectors:  433
## 
##  ( 214 219 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Polynomial prediction

training error
train_poly_pre <- predict(svm_poly, oj_train)
table(oj_train$Purchase, train_poly_pre)
##     train_poly_pre
##       CH  MM
##   CH 461  29
##   MM 111 199
(29+111)/(490+310)
## [1] 0.175
test error
train_poly_pre <- predict(svm_poly, oj_test)
table(oj_test$Purchase, train_poly_pre)
##     train_poly_pre
##       CH  MM
##   CH 149  14
##   MM  51  56
(14+51)/270
## [1] 0.2407407

as we can see, the test error increase to 24%.

ideally if we perform all the optimal cost, it will reduce the error rate. we will leave it for practice. However an interesting fact is tuning the result using the radial kernel performs best for most of the time. but polynomial kernel consistently reduce both test and training error.