Question 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

a: Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them.

x1=runif (500) -0.5
x2=runif (500) -0.5
y=1*(x1^2-x2^2 > 0)

b: Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.

plot(x1[y == 0], x2[y == 0], col = "purple", xlab = "X1", ylab = "X2", pch = "+")
points(x1[y == 1], x2[y == 1], col = "red", pch = 4)


c: Fit a logistic regression model to the data, using X1 and X2 as predictors.

lm.fit <- glm(y ~ x1 + x2, family = binomial)
summary(lm.fit)
## 
## Call:
## glm(formula = y ~ x1 + x2, family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.226  -1.187   1.128   1.161   1.211  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.02195    0.08957   0.245    0.806
## x1           0.16403    0.30655   0.535    0.593
## x2          -0.04094    0.30640  -0.134    0.894
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 693.08  on 499  degrees of freedom
## Residual deviance: 692.78  on 497  degrees of freedom
## AIC: 698.78
## 
## Number of Fisher Scoring iterations: 3

d: Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

data <- data.frame(x1 = x1, x2 = x2, y = y)
lm.prob <-predict(lm.fit, data, type = "response")
lm.pred <- ifelse(lm.prob > 0.52, 1, 0)
data.pos <- data[lm.pred == 1, ]
data.neg <- data[lm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "purple", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "red", pch = 4)


e: Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).

lm.fit = glm(y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), data = data, family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

f: Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

lm.prob <- predict(lm.fit, data, type = "response")
lm.pred <- ifelse(lm.prob > 0.5, 1, 0)
data.pos <- data[lm.pred == 1, ]
data.neg <- data[lm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "purple", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "red", pch = 4)


g: Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm.fit <- svm(as.factor(y) ~ x1 + x2, data, kernel = "linear", cost = 0.1)
svm.pred <- predict(svm.fit, data)
data.pos <- data[svm.pred == 1, ]
data.neg <- data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "purple", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "red", pch = 4)


h: Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm.fit <- svm(as.factor(y) ~ x1 + x2, data, gamma = 1)
svm.pred <- predict(svm.fit, data)
data.pos <- data[svm.pred == 1, ]
data.neg <- data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "purple", xlab = "X1", ylab = "X2", pch = "+")
points(data.neg$x1, data.neg$x2, col = "red", pch = 4)


i: Comment on your results.

Answer: We can see that SVMs are important to use when finding non-linear models. Though using cross-validation would be easier with gamma parameter.


Question 7

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

a: Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

gas.med <- median(Auto$mpg)
new.var <- ifelse(Auto$mpg > gas.med, 1, 0)
Auto$mpglevel <- as.factor(new.var)

b: Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

set.seed(9)
tune.out <- tune(svm, mpglevel ~ ., data = Auto, kernel = "linear", ranges = list(cost = c(0.01, 
    0.1, 1, 5, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.01019231 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.07653846 0.05100638
## 2 1e-01 0.05102564 0.03586588
## 3 1e+00 0.01019231 0.01315951
## 4 5e+00 0.02044872 0.01619554
## 5 1e+01 0.02044872 0.01619554
## 6 1e+02 0.03839744 0.03872235

c: Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

set.seed(9)
tune.out <- tune(svm, mpglevel ~ ., data = Auto, kernel = "polynomial", ranges = list(cost = c(0.1, 
    1, 5, 10), degree = c(2, 3, 4)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##    10      2
## 
## - best performance: 0.5333333 
## 
## - Detailed performance results:
##    cost degree     error dispersion
## 1   0.1      2 0.5384615 0.05263181
## 2   1.0      2 0.5384615 0.05263181
## 3   5.0      2 0.5384615 0.05263181
## 4  10.0      2 0.5333333 0.06368525
## 5   0.1      3 0.5359615 0.06001532
## 6   1.0      3 0.5359615 0.06001532
## 7   5.0      3 0.5359615 0.06001532
## 8  10.0      3 0.5359615 0.06001532
## 9   0.1      4 0.5459615 0.03201429
## 10  1.0      4 0.5459615 0.03201429
## 11  5.0      4 0.5459615 0.03201429
## 12 10.0      4 0.5459615 0.03201429
set.seed(9)
tune.out <- tune(svm, mpglevel ~ ., data = Auto, kernel = "radial", ranges = list(cost = c(0.1, 
    1, 5, 10), gamma = c(0.01, 0.1, 1, 9, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##    10  0.01
## 
## - best performance: 0.02301282 
## 
## - Detailed performance results:
##    cost gamma      error dispersion
## 1   0.1 1e-02 0.08929487 0.05558189
## 2   1.0 1e-02 0.07147436 0.04312562
## 3   5.0 1e-02 0.05102564 0.03376772
## 4  10.0 1e-02 0.02301282 0.02549182
## 5   0.1 1e-01 0.07653846 0.05100638
## 6   1.0 1e-01 0.05615385 0.04294734
## 7   5.0 1e-01 0.03064103 0.02008502
## 8  10.0 1e-01 0.02551282 0.02076457
## 9   0.1 1e+00 0.51096154 0.13709174
## 10  1.0 1e+00 0.06128205 0.05274896
## 11  5.0 1e+00 0.06134615 0.05006757
## 12 10.0 1e+00 0.06134615 0.05006757
## 13  0.1 9e+00 0.54596154 0.03201429
## 14  1.0 9e+00 0.50000000 0.03616228
## 15  5.0 9e+00 0.49487179 0.04143426
## 16 10.0 9e+00 0.49487179 0.04143426
## 17  0.1 1e+01 0.54596154 0.03201429
## 18  1.0 1e+01 0.50506410 0.03662576
## 19  5.0 1e+01 0.49743590 0.03803299
## 20 10.0 1e+01 0.49743590 0.03803299
## 21  0.1 1e+02 0.54846154 0.02632839
## 22  1.0 1e+02 0.54846154 0.02632839
## 23  5.0 1e+02 0.54846154 0.02632839
## 24 10.0 1e+02 0.54846154 0.02632839

d: Make some plots to back up your assertions in (b) and (c).

svm.linear <- svm(mpglevel ~ ., data = Auto, kernel = "linear", cost = 1)
svm.poly <- svm(mpglevel ~ ., data = Auto, kernel = "polynomial", cost = 100, degree = 2)
svm.radial <- svm(mpglevel ~ ., data = Auto, kernel = "radial", cost = 100, gamma = 0.01)
plotpairs = function(fit) {
    for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
        plot(fit, Auto, as.formula(paste("mpg~", name, sep = "")))
    }
}
plotpairs(svm.linear)


Question 8

This problem involves the OJ data set which is part of the ISLR package.

a: Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

set.seed(9)
train <- sample(dim(OJ)[1], 800)
OJ.train <- OJ[train, ]
OJ.test <- OJ[-train, ]

b: Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

svm.linear <- svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = 0.01)
summary(svm.linear)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  426
## 
##  ( 213 213 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

c: What are the training and test error rates?

train.pred = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
##     train.pred
##       CH  MM
##   CH 455  50
##   MM  77 218
train.error = round(mean(train.pred != OJ.train$Purchase)*100,2)
train.error
## [1] 15.88
test.pred = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
##     test.pred
##       CH  MM
##   CH 131  17
##   MM  39  83
test.error = round(mean(test.pred != OJ.test$Purchase)*100,2)
test.error
## [1] 20.74

d: Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

set.seed(9)
tune.out <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "linear", ranges = list(cost = 10^seq(-2, 
    1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##       cost
##  0.3162278
## 
## - best performance: 0.15375 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.15500 0.02776389
## 2   0.01778279 0.16125 0.04016027
## 3   0.03162278 0.15625 0.02960973
## 4   0.05623413 0.15875 0.02889757
## 5   0.10000000 0.15875 0.03120831
## 6   0.17782794 0.15500 0.03184162
## 7   0.31622777 0.15375 0.03175973
## 8   0.56234133 0.15500 0.03343734
## 9   1.00000000 0.15625 0.03186887
## 10  1.77827941 0.15875 0.03335936
## 11  3.16227766 0.15750 0.03129164
## 12  5.62341325 0.16125 0.03087272
## 13 10.00000000 0.16125 0.02972676

e: Compute the training and test error rates using this new value for cost

svm.linear <- svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = tune.out$best.parameters$cost)
train.pred <- predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
##     train.pred
##       CH  MM
##   CH 455  50
##   MM  72 223
train.error = round(mean(train.pred != OJ.test$Purchase)*100,2)
## Warning in `!=.default`(train.pred, OJ.test$Purchase): longer object length is
## not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
test.pred <- predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
##     test.pred
##       CH  MM
##   CH 131  17
##   MM  35  87

f: Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

set.seed(9)
svm.radial <- svm(Purchase ~ ., data = OJ.train, kernel = "radial")
summary(svm.radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  365
## 
##  ( 187 178 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train.pred <- predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
##     train.pred
##       CH  MM
##   CH 466  39
##   MM  73 222
test.pred <- predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
##     test.pred
##       CH  MM
##   CH 132  16
##   MM  37  85
set.seed(9)
tune.out <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "radial", ranges = list(cost = 10^seq(-2, 
    1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##       cost
##  0.5623413
## 
## - best performance: 0.155 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.36875 0.05111602
## 2   0.01778279 0.36875 0.05111602
## 3   0.03162278 0.36875 0.05145454
## 4   0.05623413 0.21000 0.03525699
## 5   0.10000000 0.17500 0.03333333
## 6   0.17782794 0.17125 0.02638523
## 7   0.31622777 0.16250 0.03173239
## 8   0.56234133 0.15500 0.02958040
## 9   1.00000000 0.16000 0.03717451
## 10  1.77827941 0.15625 0.03644345
## 11  3.16227766 0.16375 0.02729087
## 12  5.62341325 0.17000 0.02776389
## 13 10.00000000 0.17375 0.02531057
svm.radial <- svm(Purchase ~., kernel = 'radial', data = OJ.train, cost = tune.out$best.parameter$cost)
summary(svm.radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial", cost = tune.out$best.parameter$cost)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.5623413 
## 
## Number of Support Vectors:  394
## 
##  ( 199 195 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train.pred <- predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
##     train.pred
##       CH  MM
##   CH 467  38
##   MM  74 221
test.pred <- predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
##     test.pred
##       CH  MM
##   CH 132  16
##   MM  37  85

g: Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set “degree” = 2.

svm.poly <- svm(Purchase ~ ., kernel = "polynomial", data = OJ.train, degree = 2)
summary(svm.poly)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial", 
##     degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  431
## 
##  ( 219 212 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train.pred <- predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
##     train.pred
##       CH  MM
##   CH 468  37
##   MM 105 190
test.pred <- predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
##     test.pred
##       CH  MM
##   CH 136  12
##   MM  45  77
set.seed(2)
tune.out <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "polynomial", degree = 2, ranges = list(cost = 10^seq(-2, 
    1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.165 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.36750 0.06826216
## 2   0.01778279 0.34500 0.06593347
## 3   0.03162278 0.33750 0.06373774
## 4   0.05623413 0.32875 0.05684103
## 5   0.10000000 0.30500 0.05143766
## 6   0.17782794 0.25500 0.04901814
## 7   0.31622777 0.21500 0.05230785
## 8   0.56234133 0.20250 0.04993051
## 9   1.00000000 0.19250 0.05565269
## 10  1.77827941 0.18375 0.04860913
## 11  3.16227766 0.17500 0.04564355
## 12  5.62341325 0.17000 0.04338138
## 13 10.00000000 0.16500 0.04362084
svm.poly <- svm(Purchase ~ ., kernel = "polynomial", degree = 2, data = OJ.train, cost = tune.out$best.parameter$cost)
summary(svm.poly)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial", 
##     degree = 2, cost = tune.out$best.parameter$cost)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  10 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  335
## 
##  ( 170 165 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train.pred <- predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
##     train.pred
##       CH  MM
##   CH 470  35
##   MM  72 223
test.pred <- predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
##     test.pred
##       CH  MM
##   CH 132  16
##   MM  40  82

h: Overall, which approach seems to give the best results on this data?

Answer: The radial kernel seems to produce te best results on this data because it has the lowest misclassification error on both the training and testing dataset.