Q7. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.
set.seed(1)
tune.out = tune(svm, mpglevel ~ ., data = Auto, kernel = "polynomial", ranges = list(cost = c(0.1, 1, 5, 10), degree = c(2, 3, 4)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 10 2
##
## - best performance: 0.5130128
##
## - Detailed performance results:
## cost degree error dispersion
## 1 0.1 2 0.5511538 0.04366593
## 2 1.0 2 0.5511538 0.04366593
## 3 5.0 2 0.5511538 0.04366593
## 4 10.0 2 0.5130128 0.08963366
## 5 0.1 3 0.5511538 0.04366593
## 6 1.0 3 0.5511538 0.04366593
## 7 5.0 3 0.5511538 0.04366593
## 8 10.0 3 0.5511538 0.04366593
## 9 0.1 4 0.5511538 0.04366593
## 10 1.0 4 0.5511538 0.04366593
## 11 5.0 4 0.5511538 0.04366593
## 12 10.0 4 0.5511538 0.04366593
set.seed(1)
tune.out = tune(svm, mpglevel ~ ., data = Auto, kernel = "radial", ranges = list(cost = c(0.1, 1, 5, 10), gamma = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 10 0.01
##
## - best performance: 0.02557692
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 0.1 1e-02 0.08929487 0.04382379
## 2 1.0 1e-02 0.07403846 0.03522110
## 3 5.0 1e-02 0.04852564 0.03303346
## 4 10.0 1e-02 0.02557692 0.02093679
## 5 0.1 1e-01 0.07903846 0.03874545
## 6 1.0 1e-01 0.05371795 0.03525162
## 7 5.0 1e-01 0.02820513 0.03299190
## 8 10.0 1e-01 0.03076923 0.03375798
## 9 0.1 1e+00 0.55115385 0.04366593
## 10 1.0 1e+00 0.06384615 0.04375618
## 11 5.0 1e+00 0.05884615 0.04020934
## 12 10.0 1e+00 0.05884615 0.04020934
## 13 0.1 5e+00 0.55115385 0.04366593
## 14 1.0 5e+00 0.49493590 0.04724924
## 15 5.0 5e+00 0.48217949 0.05470903
## 16 10.0 5e+00 0.48217949 0.05470903
## 17 0.1 1e+01 0.55115385 0.04366593
## 18 1.0 1e+01 0.51794872 0.05063697
## 19 5.0 1e+01 0.51794872 0.04917316
## 20 10.0 1e+01 0.51794872 0.04917316
## 21 0.1 1e+02 0.55115385 0.04366593
## 22 1.0 1e+02 0.55115385 0.04366593
## 23 5.0 1e+02 0.55115385 0.04366593
## 24 10.0 1e+02 0.55115385 0.04366593
Adjusting the values of degree increases the error, especially in comparison to the model in part b. The smaller the gamma value, the smaller the error rate will be.
(d) Make some plots to justify (C) and (D)
svm.linear = svm(mpglevel ~ ., data = Auto, kernel = "linear", cost = 1)
svm.poly = svm(mpglevel ~ ., data = Auto, kernel = "polynomial", cost = 10,
degree = 2)
svm.radial = svm(mpglevel ~ ., data = Auto, kernel = "radial", cost = 10, gamma = 0.01)
plotpairs = function(fit) {
for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
plot(fit, Auto, as.formula(paste("mpg~", name, sep = "")))
}
}
plotpairs(svm.linear)







plotpairs(svm.poly)







plotpairs(svm.radial)







Q8. This problem involves the OJ data set which is part of the ISLR package.
(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
library(ISLR)
set.seed(1)
train = sample(dim(OJ)[1], 800)
OJ.train = OJ[train, ]
OJ.test = OJ[-train, ]
(b) Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
library(e1071)
svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = 0.01)
summary(svm.linear)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 435
##
## ( 219 216 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
(c) What are the training and test error rates?
train.pred = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 420 65
## MM 75 240
(75 + 65)/(420 + 65 + 75 + 240)
## [1] 0.175
test.pred = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 153 15
## MM 33 69
(33 + 15)/(153 + 15 + 33 + 69)
## [1] 0.1777778
The training error rate is 0.175 while the test error rate is 0.178.
(d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
set.seed(1554)
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "linear", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.3162278
##
## - best performance: 0.17125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.17750 0.06635343
## 2 0.01778279 0.17750 0.05916080
## 3 0.03162278 0.17500 0.06095308
## 4 0.05623413 0.17375 0.06755913
## 5 0.10000000 0.17625 0.06755913
## 6 0.17782794 0.17625 0.06573569
## 7 0.31622777 0.17125 0.06483151
## 8 0.56234133 0.17375 0.06573569
## 9 1.00000000 0.17250 0.06258328
## 10 1.77827941 0.17500 0.06997023
## 11 3.16227766 0.17250 0.06661456
## 12 5.62341325 0.17625 0.07155272
## 13 10.00000000 0.17875 0.07072295
(e) Compute the training and test error rates using this new value for cost.
svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = tune.out$best.parameters$cost)
train.pred = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 423 62
## MM 71 244
(71 + 62)/(423 + 62 + 71 + 244)
## [1] 0.16625
test.pred = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 155 13
## MM 29 73
(29 + 13)/(155 + 13 + 29 + 73)
## [1] 0.1555556
The training error rate is 0.166 and the test error is 0.1555
Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
set.seed(410)
svm.radial = svm(Purchase ~ ., data = OJ.train, kernel = "radial")
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 373
##
## ( 188 185 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred = predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 441 44
## MM 77 238
(77 + 44)/(441 + 44 + 77 + 238)
## [1] 0.15125
test.pred = predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 151 17
## MM 33 69
(33 + 17)/(151 + 17 + 33 + 69)
## [1] 0.1851852
The training error for radical is 0.15 and the test error rate is 0.185.
set.seed(1)
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "radial", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.5623413
##
## - best performance: 0.16875
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.39375 0.04007372
## 2 0.01778279 0.39375 0.04007372
## 3 0.03162278 0.35750 0.05927806
## 4 0.05623413 0.19500 0.02443813
## 5 0.10000000 0.18625 0.02853482
## 6 0.17782794 0.18250 0.03291403
## 7 0.31622777 0.17875 0.03230175
## 8 0.56234133 0.16875 0.02651650
## 9 1.00000000 0.17125 0.02128673
## 10 1.77827941 0.17625 0.02079162
## 11 3.16227766 0.17750 0.02266912
## 12 5.62341325 0.18000 0.02220485
## 13 10.00000000 0.18625 0.02853482
svm.radial = svm(Purchase ~ ., data = OJ.train, kernel = "radial", cost = tune.out$best.parameters$cost)
train.pred = predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 437 48
## MM 71 244
(71 + 48)/(437 + 48 + 71 + 244)
## [1] 0.14875
test.pred = predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 150 18
## MM 30 72
(30 + 18)/(150 + 18 + 30 + 72)
## [1] 0.1777778
Now, after tuning, the training error is 0.149 and the test error is 0.178.
(g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
set.seed(1)
svm.poly = svm(Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2)
summary(svm.poly)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 447
##
## ( 225 222 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred = predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 449 36
## MM 110 205
(110 + 36)/(449 + 36 + 110 + 205)
## [1] 0.1825
test.pred = predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 153 15
## MM 45 57
(45 + 15)/(153 + 15 + 45 + 57)
## [1] 0.2222222
The training and testing error rates using a svm with degree 2 is 0.18 and 0.22 respectively.
set.seed(1)
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2,
ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 3.162278
##
## - best performance: 0.1775
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.39125 0.04210189
## 2 0.01778279 0.37125 0.03537988
## 3 0.03162278 0.36500 0.03476109
## 4 0.05623413 0.33750 0.04714045
## 5 0.10000000 0.32125 0.05001736
## 6 0.17782794 0.24500 0.04758034
## 7 0.31622777 0.19875 0.03972562
## 8 0.56234133 0.20500 0.03961621
## 9 1.00000000 0.20250 0.04116363
## 10 1.77827941 0.18500 0.04199868
## 11 3.16227766 0.17750 0.03670453
## 12 5.62341325 0.18375 0.03064696
## 13 10.00000000 0.18125 0.02779513
svm.poly = svm(Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2, cost = tune.out$best.parameters$cost)
train.pred = predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 451 34
## MM 90 225
(90 + 34)/(451 + 34 + 90 + 225)
## [1] 0.155
test.pred = predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 154 14
## MM 41 61
(41 + 14)/(154 + 14 + 41 + 61)
## [1] 0.2037037
After tuning, the training and testing error rates are 0.156 and 0.20 respectively.
(h) Overall, which approach seems to give the best results on this data?
The radial approach appears to be the best option given its error rates.