We will use the Orange juice data from ISLR packages, since the data has 1070 observations, we will randomize the data first and use 870 as training set and 200 as test set.
library(ISLR)
## Warning: package 'ISLR' was built under R version 3.4.2
attach(OJ)
set.seed(123)
oj_rand<- OJ[order(runif(1070)),]
train <-sample(nrow(oj_rand),800)
oj_train<-oj_rand[train, ]
oj_test<-oj_rand[-train,]
We will apply the Support Vector Classifier to the training data using cost=0.01, with Purchase as response and the other variables as predictors. We will show the difference in linear kernel, radial kernel and polynomial kernel
library(e1071)
## Warning: package 'e1071' was built under R version 3.4.2
ojsvmlin<- svm(Purchase ~ ., data=oj_train, kernel="linear", cost=0.01)
summary(ojsvmlin)
##
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "linear",
## cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
## gamma: 0.05555556
##
## Number of Support Vectors: 440
##
## ( 220 220 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
The support vector classifier creates 440 support vector out of 800 training set . 220 belongs to Minute Main(MM) , and 220 belongs to Citrus Hill(CH).
train_pred <- predict(ojsvmlin, oj_train)
table(oj_train$Purchase, train_pred)
## train_pred
## CH MM
## CH 437 53
## MM 80 230
(80+53)/(437+53+80+230)
## [1] 0.16625
test_pred <- predict(ojsvmlin, oj_test)
table(oj_test$Purchase, test_pred)
## test_pred
## CH MM
## CH 145 18
## MM 29 78
(18+29)/(145+18+29+78)
## [1] 0.1740741
we can see the training error is 0.166 and test error is 0.174, which are pretty close.
tune.out <- tune(svm, Purchase ~. , data=oj_train, kernel="linear", ranges=list(cost=10^seq(-2,1,by=0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.1
##
## - best performance: 0.17375
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.17875 0.04566256
## 2 0.01778279 0.17625 0.04348132
## 3 0.03162278 0.17625 0.04505013
## 4 0.05623413 0.17500 0.04526159
## 5 0.10000000 0.17375 0.04543387
## 6 0.17782794 0.18000 0.04721405
## 7 0.31622777 0.18000 0.04297932
## 8 0.56234133 0.18000 0.04571956
## 9 1.00000000 0.18000 0.04297932
## 10 1.77827941 0.17875 0.04715886
## 11 3.16227766 0.18000 0.04377975
## 12 5.62341325 0.18250 0.04794383
## 13 10.00000000 0.18125 0.04903584
we can see the optimal situation is when cost=0.1 we have error=0.1737
When we apply the radial kernel, the default value is gamma
set.seed(234)
svm_radial <-svm(Purchase ~. , data=oj_train, kernel="radial")
summary(svm_radial)
##
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.05555556
##
## Number of Support Vectors: 373
##
## ( 186 187 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train_rad_pre <- predict(svm_radial, oj_train)
table(oj_train$Purchase, train_rad_pre)
## train_rad_pre
## CH MM
## CH 450 40
## MM 81 229
(40+81)/(490+310)
## [1] 0.15125
test_rad_pre <- predict(svm_radial, oj_test)
table(oj_test$Purchase, test_rad_pre)
## test_rad_pre
## CH MM
## CH 143 20
## MM 32 75
(52)/(175+95)
## [1] 0.1925926
the radial result, the test set error increase to 0.19 where train set is 0.15
svm_poly <-svm(Purchase~ ., kernel="polynomial" , data=oj_train, degree=2)
summary(svm_poly)
##
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "polynomial",
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 2
## gamma: 0.05555556
## coef.0: 0
##
## Number of Support Vectors: 433
##
## ( 214 219 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train_poly_pre <- predict(svm_poly, oj_train)
table(oj_train$Purchase, train_poly_pre)
## train_poly_pre
## CH MM
## CH 461 29
## MM 111 199
(29+111)/(490+310)
## [1] 0.175
train_poly_pre <- predict(svm_poly, oj_test)
table(oj_test$Purchase, train_poly_pre)
## train_poly_pre
## CH MM
## CH 149 14
## MM 51 56
(14+51)/270
## [1] 0.2407407
as we can see, the test error increase to 24%.
ideally if we perform all the optimal cost, it will reduce the error rate. we will leave it for practice. However an interesting fact is tuning the result using the radial kernel performs best for most of the time. but polynomial kernel consistently reduce both test and training error.