OJ SVM

Tree

We will apply the Support Vector Classifier to the training data using cost=0.01, with Purchase as response and the other variables as predictors. We will show the difference in linear kernel, radial kernel and polynomial kernel

1. Linear Kernel

library(e1071)

## Warning: package 'e1071' was built under R version 3.4.2

ojsvmlin<- svm(Purchase ~ ., data=oj_train, kernel="linear", cost=0.01)
summary(ojsvmlin)

## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "linear", 
##     cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  440
## 
##  ( 220 220 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

The support vector classifier creates 440 support vector out of 800 training set . 220 belongs to Minute Main(MM) , and 220 belongs to Citrus Hill(CH).

Training and test set error in linear kernel

training

train_pred <- predict(ojsvmlin, oj_train)
table(oj_train$Purchase, train_pred)

##     train_pred
##       CH  MM
##   CH 437  53
##   MM  80 230

(80+53)/(437+53+80+230)

## [1] 0.16625

test

test_pred <- predict(ojsvmlin, oj_test)
table(oj_test$Purchase, test_pred)

##     test_pred
##       CH  MM
##   CH 145  18
##   MM  29  78

(18+29)/(145+18+29+78)

## [1] 0.1740741

we can see the training error is 0.166 and test error is 0.174, which are pretty close.

Use the tune() function to select an optimal cost (0.01 to 10)

tune.out <- tune(svm, Purchase ~. , data=oj_train, kernel="linear", ranges=list(cost=10^seq(-2,1,by=0.25)))
summary(tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.1
## 
## - best performance: 0.17375 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.17875 0.04566256
## 2   0.01778279 0.17625 0.04348132
## 3   0.03162278 0.17625 0.04505013
## 4   0.05623413 0.17500 0.04526159
## 5   0.10000000 0.17375 0.04543387
## 6   0.17782794 0.18000 0.04721405
## 7   0.31622777 0.18000 0.04297932
## 8   0.56234133 0.18000 0.04571956
## 9   1.00000000 0.18000 0.04297932
## 10  1.77827941 0.17875 0.04715886
## 11  3.16227766 0.18000 0.04377975
## 12  5.62341325 0.18250 0.04794383
## 13 10.00000000 0.18125 0.04903584

we can see the optimal situation is when cost=0.1 we have error=0.1737

2.Use the radial kernel.

When we apply the radial kernel, the default value is gamma

set.seed(234)
svm_radial <-svm(Purchase ~. , data=oj_train, kernel="radial")
summary(svm_radial)

## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  373
## 
##  ( 186 187 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Compare the error rate with radial kernel, training and test set

training set

train_rad_pre <- predict(svm_radial, oj_train)
table(oj_train$Purchase, train_rad_pre)

##     train_rad_pre
##       CH  MM
##   CH 450  40
##   MM  81 229

(40+81)/(490+310)

## [1] 0.15125

test set

test_rad_pre <- predict(svm_radial, oj_test)
table(oj_test$Purchase, test_rad_pre)

##     test_rad_pre
##       CH  MM
##   CH 143  20
##   MM  32  75

(52)/(175+95)

## [1] 0.1925926

the radial result, the test set error increase to 0.19 where train set is 0.15

3.Polynomial Kernel

svm_poly <-svm(Purchase~ ., kernel="polynomial" , data=oj_train, degree=2)
summary(svm_poly)

## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "polynomial", 
##     degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  2 
##       gamma:  0.05555556 
##      coef.0:  0 
## 
## Number of Support Vectors:  433
## 
##  ( 214 219 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Polynomial prediction

training error

train_poly_pre <- predict(svm_poly, oj_train)
table(oj_train$Purchase, train_poly_pre)

##     train_poly_pre
##       CH  MM
##   CH 461  29
##   MM 111 199

(29+111)/(490+310)

## [1] 0.175

test error

train_poly_pre <- predict(svm_poly, oj_test)
table(oj_test$Purchase, train_poly_pre)

##     train_poly_pre
##       CH  MM
##   CH 149  14
##   MM  51  56

(14+51)/270

## [1] 0.2407407

as we can see, the test error increase to 24%.

ideally if we perform all the optimal cost, it will reduce the error rate. we will leave it for practice. However an interesting fact is tuning the result using the radial kernel performs best for most of the time. but polynomial kernel consistently reduce both test and training error.