library(e1071)
library(ISLR)

Problem 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features. Support Vector Machines (a) Generate a data set with n = 500 and p = 2, such that the obser- vations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:

set.seed(123)
x1 = runif(500) - 0.5
x2 = runif(500) - 0.5
y = 1 * (x1^2 - x2^2 > 0)

(b)Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y- axis.

plot(x1[y == 0], x2[y == 0], col = "blue", xlab = "X1", ylab = "X2", pch = "+")
points(x1[y == 1], x2[y == 1], col = "purple", pch = 4)

(c)Fit a logistic regression model to the data, using X1 and X2 as predictors.

lm.fit = glm(y ~ x1 + x2, family = binomial)
summary(lm.fit)
## 
## Call:
## glm(formula = y ~ x1 + x2, family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.227  -1.200   1.133   1.157   1.188  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.04792    0.08949   0.535    0.592
## x1          -0.03999    0.31516  -0.127    0.899
## x2           0.11509    0.30829   0.373    0.709
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.86  on 499  degrees of freedom
## Residual deviance: 692.71  on 497  degrees of freedom
## AIC: 698.71
## 
## Number of Fisher Scoring iterations: 3

(d)Apply this model to the training data in order to obtain a pre- dicted class label for each training observation. Plot the ob- servations, colored according to the predicted class labels. The decision boundary should be linear.

data = data.frame(x1 = x1, x2 = x2, y = y)
lm.prob = predict(lm.fit, data, type = "response")
lm.pred = ifelse(lm.prob > 0.52, 1, 0)
data.pos = data[lm.pred == 1, ]
data.neg = data[lm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "purple", pch = 4)

(e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).

lm.fit = glm(y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), data = data, family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
  1. Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
lm.prob = predict(lm.fit, data, type = "response")
lm.pred = ifelse(lm.prob > 0.5, 1, 0)
data.pos = data[lm.pred == 1, ]
data.neg = data[lm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "purple", pch = 4)

(g)Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observa- tion. Plot the observations, colored according to the predicted class labels.

svm.fit = svm(as.factor(y) ~ x1 + x2, data, kernel = "linear", cost = 0.1)
svm.pred = predict(svm.fit, data)
data.pos = data[svm.pred == 1, ]
data.neg = data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "purple", pch = 4)

(h)Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm.fit = svm(as.factor(y) ~ x1 + x2, data, gamma = 1)
svm.pred = predict(svm.fit, data)
data.pos = data[svm.pred == 1, ]
data.neg = data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "purple", pch = 4)

(i)Comment on your results. We see that we can gain a non-linear decision boundary by using non-linear kernels within our logistic regressions and SVCs.

Problem 7

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set. (a)Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

gas_m = median(Auto$mpg)
new_v = ifelse(Auto$mpg > gas_m, 1, 0)
Auto$mpglevel = as.factor(new_v)

(b)Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with dif- ferent values of this parameter. Comment on your results

set.seed(123)
tune.out = tune(svm, mpglevel ~ ., data = Auto, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.01025641 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.07634615 0.03928191
## 2 1e-01 0.04333333 0.03191738
## 3 1e+00 0.01025641 0.01792836
## 4 5e+00 0.01538462 0.01792836
## 5 1e+01 0.01788462 0.01727588
## 6 1e+02 0.03320513 0.02720447

We see the lowest error when cost = 1.

(c)Now repeat (b), this time using SVMs with radial and polyno- mial basis kernels, with different values of gamma and degree and cost. Comment on your results.

set.seed(123)
tune.out = tune(svm, mpglevel ~ ., data = Auto, kernel = "polynomial", ranges = list(cost = c(0.1, 1, 5, 10), degree = c(2, 3, 4)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##    10      2
## 
## - best performance: 0.5714744 
## 
## - Detailed performance results:
##    cost degree     error dispersion
## 1   0.1      2 0.5817308 0.04740051
## 2   1.0      2 0.5817308 0.04740051
## 3   5.0      2 0.5817308 0.04740051
## 4  10.0      2 0.5714744 0.04575370
## 5   0.1      3 0.5817308 0.04740051
## 6   1.0      3 0.5817308 0.04740051
## 7   5.0      3 0.5817308 0.04740051
## 8  10.0      3 0.5817308 0.04740051
## 9   0.1      4 0.5817308 0.04740051
## 10  1.0      4 0.5817308 0.04740051
## 11  5.0      4 0.5817308 0.04740051
## 12 10.0      4 0.5817308 0.04740051
set.seed(123)
tune.out = tune(svm, mpglevel ~ ., data = Auto, kernel = "radial", ranges = list(cost = c(0.1, 1, 5, 10), gamma = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##    10  0.01
## 
## - best performance: 0.02032051 
## 
## - Detailed performance results:
##    cost gamma      error dispersion
## 1   0.1 1e-02 0.08916667 0.04345384
## 2   1.0 1e-02 0.07378205 0.04185248
## 3   5.0 1e-02 0.04589744 0.03136327
## 4  10.0 1e-02 0.02032051 0.02305327
## 5   0.1 1e-01 0.07634615 0.03928191
## 6   1.0 1e-01 0.05852564 0.03960325
## 7   5.0 1e-01 0.03057692 0.02611396
## 8  10.0 1e-01 0.03314103 0.02942215
## 9   0.1 1e+00 0.58173077 0.04740051
## 10  1.0 1e+00 0.05865385 0.04942437
## 11  5.0 1e+00 0.05608974 0.04595880
## 12 10.0 1e+00 0.05608974 0.04595880
## 13  0.1 5e+00 0.58173077 0.04740051
## 14  1.0 5e+00 0.51544872 0.06790600
## 15  5.0 5e+00 0.51544872 0.06790600
## 16 10.0 5e+00 0.51544872 0.06790600
## 17  0.1 1e+01 0.58173077 0.04740051
## 18  1.0 1e+01 0.54602564 0.06355090
## 19  5.0 1e+01 0.54102564 0.06959451
## 20 10.0 1e+01 0.54102564 0.06959451
## 21  0.1 1e+02 0.58173077 0.04740051
## 22  1.0 1e+02 0.58173077 0.04740051
## 23  5.0 1e+02 0.58173077 0.04740051
## 24 10.0 1e+02 0.58173077 0.04740051

We performed 10-fold cross validation, the best performance is 0.02032051, gamma is 0.01 (d)Make some plots to back up your assertions in (b) and (c).

svm.linear = svm(mpglevel ~ ., data = Auto, kernel = "linear", cost = 1)
svm.poly = svm(mpglevel ~ ., data = Auto, kernel = "polynomial", cost = 10, 
    degree = 2)
svm.radial = svm(mpglevel ~ ., data = Auto, kernel = "radial", cost = 10, gamma = 0.01)
plotpairs = function(fit) {
    for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
        plot(fit, Auto, as.formula(paste("mpg~", name, sep = "")))
    }
}
plotpairs(svm.linear)

# Problem 8 This problem involves the OJ data set which is part of the ISLR package. (a)Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

set.seed(123)
train = sample(dim(OJ)[1], 800)
OJ.train = OJ[train, ]
OJ.test = OJ[-train, ]

(b)Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = 0.01)
summary(svm.linear)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  442
## 
##  ( 220 222 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

(c)What are the training and test error rates?

train.pred = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
##     train.pred
##       CH  MM
##   CH 426  61
##   MM  71 242
test.pred = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
##     test.pred
##       CH  MM
##   CH 145  21
##   MM  27  77
set.seed(123)
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "linear", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##      cost
##  3.162278
## 
## - best performance: 0.16375 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.17375 0.04910660
## 2   0.01778279 0.17250 0.04816061
## 3   0.03162278 0.17125 0.04678927
## 4   0.05623413 0.17250 0.04594683
## 5   0.10000000 0.17500 0.04823265
## 6   0.17782794 0.17625 0.04427267
## 7   0.31622777 0.17125 0.04084609
## 8   0.56234133 0.17125 0.04168749
## 9   1.00000000 0.16875 0.03963812
## 10  1.77827941 0.16875 0.04135299
## 11  3.16227766 0.16375 0.04185375
## 12  5.62341325 0.17125 0.03998698
## 13 10.00000000 0.17000 0.04005205

(e)Compute the training and test error rates using this new value for cost.

svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = tune.out$best.parameters$cost)
train.pred = predict(svm.linear, OJ.train)
table=table(OJ.train$Purchase, train.pred)
training_error =1-sum(diag(table))/sum(table)
training_error
## [1] 0.16125
test.pred = predict(svm.linear, OJ.test)
test_table = table(OJ.test$Purchase, test.pred)
test_error = 1-sum(diag(test_table))/sum(test_table)
test_error
## [1] 0.1555556
  1. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
set.seed(123)
svm.radial = svm(Purchase ~ ., data = OJ.train, kernel = "radial")
summary(svm.radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  367
## 
##  ( 181 186 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train.pred = predict(svm.radial, OJ.train)
table=table(OJ.train$Purchase, train.pred)
training_error =1-sum(diag(table))/sum(table)
training_error
## [1] 0.13875
test.pred = predict(svm.radial, OJ.test)
test_table = table(OJ.test$Purchase, test.pred)
test_error = 1-sum(diag(test_table))/sum(test_table)
test_error
## [1] 0.1888889
set.seed(123)
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "radial", ranges = list(cost = 10^seq(-2, 
    1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.16125 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.39125 0.04411554
## 2   0.01778279 0.39125 0.04411554
## 3   0.03162278 0.37750 0.05027701
## 4   0.05623413 0.20625 0.07126096
## 5   0.10000000 0.17625 0.06108112
## 6   0.17782794 0.17750 0.06230525
## 7   0.31622777 0.17125 0.05104804
## 8   0.56234133 0.16625 0.05272110
## 9   1.00000000 0.16125 0.04875178
## 10  1.77827941 0.16250 0.04564355
## 11  3.16227766 0.16375 0.04226652
## 12  5.62341325 0.16500 0.03717451
## 13 10.00000000 0.17000 0.04794383
svm.radial = svm(Purchase ~ ., data = OJ.train, kernel = "radial", cost = tune.out$best.parameters$cost)
train.pred = predict(svm.radial, OJ.train)
table=table(OJ.train$Purchase, train.pred)
training_error =1-sum(diag(table))/sum(table)
training_error
## [1] 0.13875

We get a training error of 13.88%

test.pred = predict(svm.radial, OJ.test)
test_table = table(OJ.test$Purchase, test.pred)
test_error = 1-sum(diag(test_table))/sum(test_table)
test_error
## [1] 0.1888889

We get a test error of 18.89%

(g)Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.

set.seed(456)
svm.poly = svm(Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2)
summary(svm.poly)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  445
## 
##  ( 219 226 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train.pred = predict(svm.poly, OJ.train)
table = table(OJ.train$Purchase, train.pred)
training_error = 1-sum(diag(table))/sum(table)
training_error
## [1] 0.1725
test.pred = predict(svm.poly, OJ.test)
table = table(OJ.test$Purchase, test.pred)
test_error = 1-sum(diag(table))/sum(table)
test_error
## [1] 0.2222222