Chapter 09 (page 368): 5, 7, 8

Problem 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

a. Generate a data set with n=500 and p=2, such that the observations belong to two classes with a quadratic decision boundary between them.

set.seed(100)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- 1 * (x1^2 - x2^2 > 0)

b. Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis and X2 on the y-axis.

plot(x1[y == 0], x2[y == 0], col = "#66C2A5", xlab = "X1", ylab = "X2", pch = "+", cex=1.5)
points(x1[y == 1], x2[y == 1], col = "#FC8D62", pch = 4, cex=1.5)

c. Fit a logistic regression model to the data, using X1 and X2 as predictors.

lm.model5 = glm(y ~ x1 + x2, family = binomial)
summary(lm.model5)
## 
## Call:
## glm(formula = y ~ x1 + x2, family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.194  -1.154  -1.119   1.195   1.245  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.05600    0.08989  -0.623    0.533
## x1          -0.14615    0.32098  -0.455    0.649
## x2           0.06528    0.30338   0.215    0.830
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.76  on 499  degrees of freedom
## Residual deviance: 692.50  on 497  degrees of freedom
## AIC: 698.5
## 
## Number of Fisher Scoring iterations: 3

d. Apply this model to training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

data = data.frame(x1 = x1, x2 = x2, y = y)
preds = predict(lm.model5, data, type = "response")
preds.choice = ifelse(preds > 0.5, 1, 0)
data.positive = data[preds.choice == 1, ]
data.negative = data[preds.choice == 0, ]
plot(data.positive$x1, data.positive$x2, col = "#66C2A5", xlab = "X1", ylab = "X2", pch = "+", cex=1.5)
points(data.negative$x1, data.negative$x2, col = "#FC8D62", pch = 4, cex=1.5)

e. Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors.

lm.model5.nl <- glm(y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), family = "binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(lm.model5.nl)
## 
## Call:
## glm(formula = y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), family = "binomial")
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -1.564e-03  -2.000e-08  -2.000e-08   2.000e-08   1.502e-03  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)
## (Intercept)     -222.21   13258.36  -0.017    0.987
## poly(x1, 2)1    3902.33  364800.33   0.011    0.991
## poly(x1, 2)2   33807.47  937118.99   0.036    0.971
## poly(x2, 2)1    -656.84  387630.34  -0.002    0.999
## poly(x2, 2)2  -35213.02 1032138.51  -0.034    0.973
## I(x1 * x2)        64.72  130327.27   0.000    1.000
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6.9276e+02  on 499  degrees of freedom
## Residual deviance: 5.1269e-06  on 494  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 25

f. Apply this model to training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should obviously be non-linear.

preds <- predict(lm.model5.nl, data, type = "response")
preds.choice <- rep(0, 500)
preds.choice[preds > 0.5] <- 1
plot(data[preds.choice == 1, ]$x1, data[preds.choice == 1, ]$x2, col = "#66C2A5", pch = (3 - 1), xlab = "X1", ylab = "X2", cex=1.5)
points(data[preds.choice == 0, ]$x1, data[preds.choice == 0, ]$x2, col = "#FC8D62", pch = (3 - 0),cex=1.5)

g. Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

data$y <- as.factor(data$y)
svm.model5 <- svm(y ~ x1 + x2, data, kernel = "linear", cost = 0.01)
preds <- predict(svm.model5, data)
plot(data[preds == 0, ]$x1, data[preds == 0, ]$x2, col = "#66C2A5", pch = (3 - 0), xlab = "X1", ylab = "X2")
points(data[preds == 1, ]$x1, data[preds == 1, ]$x2, col = "#FC8D62", pch = (3 - 1))

h. Fit a SVM using a non-linear kernel to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

data$y <- as.factor(data$y)
svm.model5.nl<- svm(y ~ x1 + x2, data, kernel = "radial", gamma = 1)
preds <- predict(svm.model5.nl, data)
plot(data[preds == 0, ]$x1, data[preds == 0, ]$x2, col = "#66C2A5", pch = (3 - 0), xlab = "X1", ylab = "X2")
points(data[preds == 1, ]$x1, data[preds == 1, ]$x2, col = "#FC8D62", pch = (3 - 1))

i. Comment on your results.

The logistic regression with non-linear terms and the SVM with non-linear terms worked best on our generated data.

Problem 7

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

a. Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

data2 = Auto
mpg.index <- ifelse(data2$mpg > median(data2$mpg), 1, 0)
data2$mpglevel <- as.factor(mpg.index)

b. Fit a support vector classifier to the data with various values of “cost”, in order to predict whether a car gets high of low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

set.seed(100)
tune.linear <- tune(svm, mpglevel ~ ., data = data2, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100, 1000)))
summary(tune.linear)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.01512821 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.07391026 0.04398186
## 2 1e-01 0.05102564 0.03408666
## 3 1e+00 0.01512821 0.02421271
## 4 5e+00 0.01775641 0.01700310
## 5 1e+01 0.02538462 0.02372507
## 6 1e+02 0.03564103 0.02125655
## 7 1e+03 0.03564103 0.02125655

A cost of 1 provides the lowest error in our example: 0.01512821.

c. Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of “gamma” and “degree” and “cost”. Comment on your results.

set.seed(100)
tune.poly <- tune(svm, mpglevel ~ ., data = data2, kernel = "polynomial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100), degree = c(2, 3, 4),gamma = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.poly)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree gamma
##   100      3   0.1
## 
## - best performance: 0.03314103 
## 
## - Detailed performance results:
##      cost degree gamma      error dispersion
## 1   1e-02      2 1e-02 0.54858974 0.06532230
## 2   1e-01      2 1e-02 0.54858974 0.06532230
## 3   1e+00      2 1e-02 0.53858974 0.06213857
## 4   5e+00      2 1e-02 0.37538462 0.07420913
## 5   1e+01      2 1e-02 0.30641026 0.06169259
## 6   1e+02      2 1e-02 0.24487179 0.08534392
## 7   1e-02      3 1e-02 0.54358974 0.07674778
## 8   1e-01      3 1e-02 0.54358974 0.07674778
## 9   1e+00      3 1e-02 0.52576923 0.07655229
## 10  5e+00      3 1e-02 0.29358974 0.06514817
## 11  1e+01      3 1e-02 0.27570513 0.06332899
## 12  1e+02      3 1e-02 0.24243590 0.07096192
## 13  1e-02      4 1e-02 0.54858974 0.06532230
## 14  1e-01      4 1e-02 0.54858974 0.06532230
## 15  1e+00      4 1e-02 0.54858974 0.06532230
## 16  5e+00      4 1e-02 0.54858974 0.06532230
## 17  1e+01      4 1e-02 0.54108974 0.05963219
## 18  1e+02      4 1e-02 0.41333333 0.06285234
## 19  1e-02      2 1e-01 0.53858974 0.06213857
## 20  1e-01      2 1e-01 0.30641026 0.06169259
## 21  1e+00      2 1e-01 0.24487179 0.08534392
## 22  5e+00      2 1e-01 0.15070513 0.04597390
## 23  1e+01      2 1e-01 0.14057692 0.05834237
## 24  1e+02      2 1e-01 0.15307692 0.04517581
## 25  1e-02      3 1e-01 0.27570513 0.06332899
## 26  1e-01      3 1e-01 0.24243590 0.07096192
## 27  1e+00      3 1e-01 0.07891026 0.03447059
## 28  5e+00      3 1e-01 0.05602564 0.01980436
## 29  1e+01      3 1e-01 0.04333333 0.02695392
## 30  1e+02      3 1e-01 0.03314103 0.02101863
## 31  1e-02      4 1e-01 0.41333333 0.06285234
## 32  1e-01      4 1e-01 0.31147436 0.07123846
## 33  1e+00      4 1e-01 0.23224359 0.08262688
## 34  5e+00      4 1e-01 0.18391026 0.06160257
## 35  1e+01      4 1e-01 0.19160256 0.06565809
## 36  1e+02      4 1e-01 0.18128205 0.06133469
## 37  1e-02      2 1e+00 0.24487179 0.08534392
## 38  1e-01      2 1e+00 0.14057692 0.05834237
## 39  1e+00      2 1e+00 0.15307692 0.04517581
## 40  5e+00      2 1e+00 0.17615385 0.05497876
## 41  1e+01      2 1e+00 0.17871795 0.06294743
## 42  1e+02      2 1e+00 0.17871795 0.06294743
## 43  1e-02      3 1e+00 0.04333333 0.02695392
## 44  1e-01      3 1e+00 0.03314103 0.02101863
## 45  1e+00      3 1e+00 0.03564103 0.02445289
## 46  5e+00      3 1e+00 0.03564103 0.02445289
## 47  1e+01      3 1e+00 0.03564103 0.02445289
## 48  1e+02      3 1e+00 0.03564103 0.02445289
## 49  1e-02      4 1e+00 0.18128205 0.06133469
## 50  1e-01      4 1e+00 0.18641026 0.05592726
## 51  1e+00      4 1e+00 0.18653846 0.06354458
## 52  5e+00      4 1e+00 0.18653846 0.06354458
## 53  1e+01      4 1e+00 0.18653846 0.06354458
## 54  1e+02      4 1e+00 0.18653846 0.06354458
## 55  1e-02      2 5e+00 0.13532051 0.03652090
## 56  1e-01      2 5e+00 0.17365385 0.06414276
## 57  1e+00      2 5e+00 0.17871795 0.06294743
## 58  5e+00      2 5e+00 0.17871795 0.06294743
## 59  1e+01      2 5e+00 0.17871795 0.06294743
## 60  1e+02      2 5e+00 0.17871795 0.06294743
## 61  1e-02      3 5e+00 0.03564103 0.02445289
## 62  1e-01      3 5e+00 0.03564103 0.02445289
## 63  1e+00      3 5e+00 0.03564103 0.02445289
## 64  5e+00      3 5e+00 0.03564103 0.02445289
## 65  1e+01      3 5e+00 0.03564103 0.02445289
## 66  1e+02      3 5e+00 0.03564103 0.02445289
## 67  1e-02      4 5e+00 0.18653846 0.06354458
## 68  1e-01      4 5e+00 0.18653846 0.06354458
## 69  1e+00      4 5e+00 0.18653846 0.06354458
## 70  5e+00      4 5e+00 0.18653846 0.06354458
## 71  1e+01      4 5e+00 0.18653846 0.06354458
## 72  1e+02      4 5e+00 0.18653846 0.06354458
## 73  1e-02      2 1e+01 0.15307692 0.04517581
## 74  1e-01      2 1e+01 0.17871795 0.06294743
## 75  1e+00      2 1e+01 0.17871795 0.06294743
## 76  5e+00      2 1e+01 0.17871795 0.06294743
## 77  1e+01      2 1e+01 0.17871795 0.06294743
## 78  1e+02      2 1e+01 0.17871795 0.06294743
## 79  1e-02      3 1e+01 0.03564103 0.02445289
## 80  1e-01      3 1e+01 0.03564103 0.02445289
## 81  1e+00      3 1e+01 0.03564103 0.02445289
## 82  5e+00      3 1e+01 0.03564103 0.02445289
## 83  1e+01      3 1e+01 0.03564103 0.02445289
## 84  1e+02      3 1e+01 0.03564103 0.02445289
## 85  1e-02      4 1e+01 0.18653846 0.06354458
## 86  1e-01      4 1e+01 0.18653846 0.06354458
## 87  1e+00      4 1e+01 0.18653846 0.06354458
## 88  5e+00      4 1e+01 0.18653846 0.06354458
## 89  1e+01      4 1e+01 0.18653846 0.06354458
## 90  1e+02      4 1e+01 0.18653846 0.06354458
## 91  1e-02      2 1e+02 0.17871795 0.06294743
## 92  1e-01      2 1e+02 0.17871795 0.06294743
## 93  1e+00      2 1e+02 0.17871795 0.06294743
## 94  5e+00      2 1e+02 0.17871795 0.06294743
## 95  1e+01      2 1e+02 0.17871795 0.06294743
## 96  1e+02      2 1e+02 0.17871795 0.06294743
## 97  1e-02      3 1e+02 0.03564103 0.02445289
## 98  1e-01      3 1e+02 0.03564103 0.02445289
## 99  1e+00      3 1e+02 0.03564103 0.02445289
## 100 5e+00      3 1e+02 0.03564103 0.02445289
## 101 1e+01      3 1e+02 0.03564103 0.02445289
## 102 1e+02      3 1e+02 0.03564103 0.02445289
## 103 1e-02      4 1e+02 0.18653846 0.06354458
## 104 1e-01      4 1e+02 0.18653846 0.06354458
## 105 1e+00      4 1e+02 0.18653846 0.06354458
## 106 5e+00      4 1e+02 0.18653846 0.06354458
## 107 1e+01      4 1e+02 0.18653846 0.06354458
## 108 1e+02      4 1e+02 0.18653846 0.06354458

From the above table, we can see that a cost of 1e-01 a degree of 3 and a gamma of 1e+00 offer the lowest error: 0.03314103. This error is greater than the linear svm model.

set.seed(100)
tune.radial <- tune(svm, mpglevel ~ ., data = data2, kernel = "radial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100), degree = c(2, 3, 4),gamma = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.radial)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree gamma
##   100      2  0.01
## 
## - best performance: 0.01512821 
## 
## - Detailed performance results:
##      cost degree gamma      error dispersion
## 1   1e-02      2 1e-02 0.52358974 0.13184498
## 2   1e-01      2 1e-02 0.08673077 0.04371113
## 3   1e+00      2 1e-02 0.07391026 0.04398186
## 4   5e+00      2 1e-02 0.04846154 0.02806915
## 5   1e+01      2 1e-02 0.02544872 0.02662645
## 6   1e+02      2 1e-02 0.01512821 0.02421271
## 7   1e-02      3 1e-02 0.52358974 0.13184498
## 8   1e-01      3 1e-02 0.08673077 0.04371113
## 9   1e+00      3 1e-02 0.07391026 0.04398186
## 10  5e+00      3 1e-02 0.04846154 0.02806915
## 11  1e+01      3 1e-02 0.02544872 0.02662645
## 12  1e+02      3 1e-02 0.01512821 0.02421271
## 13  1e-02      4 1e-02 0.52358974 0.13184498
## 14  1e-01      4 1e-02 0.08673077 0.04371113
## 15  1e+00      4 1e-02 0.07391026 0.04398186
## 16  5e+00      4 1e-02 0.04846154 0.02806915
## 17  1e+01      4 1e-02 0.02544872 0.02662645
## 18  1e+02      4 1e-02 0.01512821 0.02421271
## 19  1e-02      2 1e-01 0.16057692 0.07482371
## 20  1e-01      2 1e-01 0.07903846 0.04566860
## 21  1e+00      2 1e-01 0.04589744 0.02894049
## 22  5e+00      2 1e-01 0.02032051 0.01592719
## 23  1e+01      2 1e-01 0.02032051 0.01592719
## 24  1e+02      2 1e-01 0.02801282 0.01434666
## 25  1e-02      3 1e-01 0.16057692 0.07482371
## 26  1e-01      3 1e-01 0.07903846 0.04566860
## 27  1e+00      3 1e-01 0.04589744 0.02894049
## 28  5e+00      3 1e-01 0.02032051 0.01592719
## 29  1e+01      3 1e-01 0.02032051 0.01592719
## 30  1e+02      3 1e-01 0.02801282 0.01434666
## 31  1e-02      4 1e-01 0.16057692 0.07482371
## 32  1e-01      4 1e-01 0.07903846 0.04566860
## 33  1e+00      4 1e-01 0.04589744 0.02894049
## 34  5e+00      4 1e-01 0.02032051 0.01592719
## 35  1e+01      4 1e-01 0.02032051 0.01592719
## 36  1e+02      4 1e-01 0.02801282 0.01434666
## 37  1e-02      2 1e+00 0.51608974 0.15412796
## 38  1e-01      2 1e+00 0.51608974 0.15412796
## 39  1e+00      2 1e+00 0.05365385 0.03722373
## 40  5e+00      2 1e+00 0.04852564 0.03517544
## 41  1e+01      2 1e+00 0.04852564 0.03517544
## 42  1e+02      2 1e+00 0.04852564 0.03517544
## 43  1e-02      3 1e+00 0.51608974 0.15412796
## 44  1e-01      3 1e+00 0.51608974 0.15412796
## 45  1e+00      3 1e+00 0.05365385 0.03722373
## 46  5e+00      3 1e+00 0.04852564 0.03517544
## 47  1e+01      3 1e+00 0.04852564 0.03517544
## 48  1e+02      3 1e+00 0.04852564 0.03517544
## 49  1e-02      4 1e+00 0.51608974 0.15412796
## 50  1e-01      4 1e+00 0.51608974 0.15412796
## 51  1e+00      4 1e+00 0.05365385 0.03722373
## 52  5e+00      4 1e+00 0.04852564 0.03517544
## 53  1e+01      4 1e+00 0.04852564 0.03517544
## 54  1e+02      4 1e+00 0.04852564 0.03517544
## 55  1e-02      2 5e+00 0.54858974 0.06532230
## 56  1e-01      2 5e+00 0.54858974 0.06532230
## 57  1e+00      2 5e+00 0.47948718 0.07716072
## 58  5e+00      2 5e+00 0.47692308 0.07824190
## 59  1e+01      2 5e+00 0.47692308 0.07824190
## 60  1e+02      2 5e+00 0.47692308 0.07824190
## 61  1e-02      3 5e+00 0.54858974 0.06532230
## 62  1e-01      3 5e+00 0.54858974 0.06532230
## 63  1e+00      3 5e+00 0.47948718 0.07716072
## 64  5e+00      3 5e+00 0.47692308 0.07824190
## 65  1e+01      3 5e+00 0.47692308 0.07824190
## 66  1e+02      3 5e+00 0.47692308 0.07824190
## 67  1e-02      4 5e+00 0.54858974 0.06532230
## 68  1e-01      4 5e+00 0.54858974 0.06532230
## 69  1e+00      4 5e+00 0.47948718 0.07716072
## 70  5e+00      4 5e+00 0.47692308 0.07824190
## 71  1e+01      4 5e+00 0.47692308 0.07824190
## 72  1e+02      4 5e+00 0.47692308 0.07824190
## 73  1e-02      2 1e+01 0.55608974 0.05262769
## 74  1e-01      2 1e+01 0.55608974 0.05262769
## 75  1e+00      2 1e+01 0.50237179 0.07356143
## 76  5e+00      2 1e+01 0.49987179 0.07524572
## 77  1e+01      2 1e+01 0.49987179 0.07524572
## 78  1e+02      2 1e+01 0.49987179 0.07524572
## 79  1e-02      3 1e+01 0.55608974 0.05262769
## 80  1e-01      3 1e+01 0.55608974 0.05262769
## 81  1e+00      3 1e+01 0.50237179 0.07356143
## 82  5e+00      3 1e+01 0.49987179 0.07524572
## 83  1e+01      3 1e+01 0.49987179 0.07524572
## 84  1e+02      3 1e+01 0.49987179 0.07524572
## 85  1e-02      4 1e+01 0.55608974 0.05262769
## 86  1e-01      4 1e+01 0.55608974 0.05262769
## 87  1e+00      4 1e+01 0.50237179 0.07356143
## 88  5e+00      4 1e+01 0.49987179 0.07524572
## 89  1e+01      4 1e+01 0.49987179 0.07524572
## 90  1e+02      4 1e+01 0.49987179 0.07524572
## 91  1e-02      2 1e+02 0.55608974 0.05262769
## 92  1e-01      2 1e+02 0.55608974 0.05262769
## 93  1e+00      2 1e+02 0.55608974 0.05262769
## 94  5e+00      2 1e+02 0.55608974 0.05262769
## 95  1e+01      2 1e+02 0.55608974 0.05262769
## 96  1e+02      2 1e+02 0.55608974 0.05262769
## 97  1e-02      3 1e+02 0.55608974 0.05262769
## 98  1e-01      3 1e+02 0.55608974 0.05262769
## 99  1e+00      3 1e+02 0.55608974 0.05262769
## 100 5e+00      3 1e+02 0.55608974 0.05262769
## 101 1e+01      3 1e+02 0.55608974 0.05262769
## 102 1e+02      3 1e+02 0.55608974 0.05262769
## 103 1e-02      4 1e+02 0.55608974 0.05262769
## 104 1e-01      4 1e+02 0.55608974 0.05262769
## 105 1e+00      4 1e+02 0.55608974 0.05262769
## 106 5e+00      4 1e+02 0.55608974 0.05262769
## 107 1e+01      4 1e+02 0.55608974 0.05262769
## 108 1e+02      4 1e+02 0.55608974 0.05262769

From the above table, we can see that a cost of 1e+02, a degree of 2 and a gamma of 1e-02 offer the lowest error: 0.01512821. This error is equivalent to the linear svm model.

d. Make some plots to back up your assertions in (b) and (c).

tune.linear <- svm(mpglevel ~ ., data = data2, kernel = "linear", cost = 1)
tune.poly <- svm(mpglevel ~ ., data = data2, kernel = "polynomial", cost = 0.1, degree = 3, gamma = 1)
tune.radial <- svm(mpglevel ~ ., data = data2, kernel = "radial", cost = 100, degree = 2, gamma = 0.01)
plotpairs = function(fit) {
    for (name in names(data2)[!(names(data2) %in% c("mpg", "mpglevel", "name"))]) {
        plot(fit, data2, as.formula(paste("mpg~", name, sep = "")))
    }
}
plotpairs(tune.linear)

Problem 8

This problem involves the “OJ” data set which is part of the ISLR package.

a. Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

set.seed(100)
data3 = OJ
index <- sample(nrow(data3), 800)
train <- data3[index, ]
test <- data3[-index, ]

b. Fit a support vector classifier to the training data using “cost” = 0.01, with “Purchase” as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

set.seed(100)
svm.linear <- svm(Purchase ~ ., data = train, kernel = "linear", cost = 0.01)
summary(svm.linear)
## 
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  432
## 
##  ( 216 216 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

There are 432 support vectors. 216 belong to the CH class and the other 216 belong to the MM class.

c. What are the training and test error rates ?

train.preds <- predict(svm.linear, train)
caret=confusionMatrix(as.factor(train.preds),as.factor(train$Purchase))
caret
1-0.8338
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 433  78
##         MM  55 234
##                                           
##                Accuracy : 0.8338          
##                  95% CI : (0.8061, 0.8589)
##     No Information Rate : 0.61            
##     P-Value [Acc > NIR] : < 2e-16         
##                                           
##                   Kappa : 0.6459          
##                                           
##  Mcnemar's Test P-Value : 0.05644         
##                                           
##             Sensitivity : 0.8873          
##             Specificity : 0.7500          
##          Pos Pred Value : 0.8474          
##          Neg Pred Value : 0.8097          
##              Prevalence : 0.6100          
##          Detection Rate : 0.5413          
##    Detection Prevalence : 0.6388          
##       Balanced Accuracy : 0.8186          
##                                           
##        'Positive' Class : CH              
##                                           
## [1] 0.1662
test.preds <- predict(svm.linear, test)
caret=confusionMatrix(as.factor(test.preds),as.factor(test$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 147  26
##         MM  18  79
##                                          
##                Accuracy : 0.837          
##                  95% CI : (0.7875, 0.879)
##     No Information Rate : 0.6111         
##     P-Value [Acc > NIR] : 5.198e-16      
##                                          
##                   Kappa : 0.6523         
##                                          
##  Mcnemar's Test P-Value : 0.2913         
##                                          
##             Sensitivity : 0.8909         
##             Specificity : 0.7524         
##          Pos Pred Value : 0.8497         
##          Neg Pred Value : 0.8144         
##              Prevalence : 0.6111         
##          Detection Rate : 0.5444         
##    Detection Prevalence : 0.6407         
##       Balanced Accuracy : 0.8216         
##                                          
##        'Positive' Class : CH             
## 
1-0.837
## [1] 0.163

The training error rate is 0.1662 and the test error rate is 0.163

d. Use the tune() function to select an optimal “cost”. Consider values in the range 0.01 to 10.

set.seed(100)
tune.cost <- tune(svm, Purchase ~ ., data = train, kernel = "linear", ranges = list(cost = 10^seq(-2, 1, by = 0.5)))
summary(tune.cost)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##        cost
##  0.03162278
## 
## - best performance: 0.16875 
## 
## - Detailed performance results:
##          cost   error dispersion
## 1  0.01000000 0.17500 0.04639804
## 2  0.03162278 0.16875 0.04299952
## 3  0.10000000 0.16875 0.03875224
## 4  0.31622777 0.17375 0.03928617
## 5  1.00000000 0.17375 0.04101575
## 6  3.16227766 0.17375 0.03251602
## 7 10.00000000 0.17000 0.03782269

The most optimal cost seems to be 0.1.

e. Compute the training and test error rates using this new value for “cost”.

set.seed(100)
svm.linear <- svm(Purchase ~ ., kernel = "linear", data = train, cost = tune.cost$best.parameter$cost)
train.preds <- predict(svm.linear, train)
caret=confusionMatrix(as.factor(train.preds),as.factor(train$Purchase))
caret
1-0.8388
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 426  67
##         MM  62 245
##                                           
##                Accuracy : 0.8388          
##                  95% CI : (0.8114, 0.8636)
##     No Information Rate : 0.61            
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6601          
##                                           
##  Mcnemar's Test P-Value : 0.7247          
##                                           
##             Sensitivity : 0.8730          
##             Specificity : 0.7853          
##          Pos Pred Value : 0.8641          
##          Neg Pred Value : 0.7980          
##              Prevalence : 0.6100          
##          Detection Rate : 0.5325          
##    Detection Prevalence : 0.6162          
##       Balanced Accuracy : 0.8291          
##                                           
##        'Positive' Class : CH              
##                                           
## [1] 0.1612
test.preds <- predict(svm.linear, test)
caret=confusionMatrix(as.factor(test.preds),as.factor(test$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 142  25
##         MM  23  80
##                                           
##                Accuracy : 0.8222          
##                  95% CI : (0.7713, 0.8659)
##     No Information Rate : 0.6111          
##     P-Value [Acc > NIR] : 4.866e-14       
##                                           
##                   Kappa : 0.6247          
##                                           
##  Mcnemar's Test P-Value : 0.8852          
##                                           
##             Sensitivity : 0.8606          
##             Specificity : 0.7619          
##          Pos Pred Value : 0.8503          
##          Neg Pred Value : 0.7767          
##              Prevalence : 0.6111          
##          Detection Rate : 0.5259          
##    Detection Prevalence : 0.6185          
##       Balanced Accuracy : 0.8113          
##                                           
##        'Positive' Class : CH              
## 
1-0.8222
## [1] 0.1778

The training error rate is 0.1612 and the test error rate is 0.1778.

f. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for “gamma”.

set.seed(100)
svm.radial <- svm(Purchase ~ ., data = train, kernel = "radial", cost = 0.01)
summary(svm.radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "radial", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.01 
## 
## Number of Support Vectors:  629
## 
##  ( 317 312 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

There are 629 support vectors. 317 belong to the CH class and 312 belong to the MM class.

train.preds <- predict(svm.radial, train)
caret=confusionMatrix(as.factor(train.preds),as.factor(train$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 488 312
##         MM   0   0
##                                          
##                Accuracy : 0.61           
##                  95% CI : (0.5752, 0.644)
##     No Information Rate : 0.61           
##     P-Value [Acc > NIR] : 0.5155         
##                                          
##                   Kappa : 0              
##                                          
##  Mcnemar's Test P-Value : <2e-16         
##                                          
##             Sensitivity : 1.00           
##             Specificity : 0.00           
##          Pos Pred Value : 0.61           
##          Neg Pred Value :  NaN           
##              Prevalence : 0.61           
##          Detection Rate : 0.61           
##    Detection Prevalence : 1.00           
##       Balanced Accuracy : 0.50           
##                                          
##        'Positive' Class : CH             
## 
1-0.61
## [1] 0.39
test.preds <- predict(svm.radial, test)
caret=confusionMatrix(as.factor(test.preds),as.factor(test$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 165 105
##         MM   0   0
##                                           
##                Accuracy : 0.6111          
##                  95% CI : (0.5501, 0.6696)
##     No Information Rate : 0.6111          
##     P-Value [Acc > NIR] : 0.5267          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.0000          
##          Pos Pred Value : 0.6111          
##          Neg Pred Value :    NaN          
##              Prevalence : 0.6111          
##          Detection Rate : 0.6111          
##    Detection Prevalence : 1.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : CH              
## 
1-0.6111
## [1] 0.3889

The training error is 0.39 and the test error is 0.3889 when we use a radial SVM with a cost of 0.01.

set.seed(100)
tune.cost <- tune(svm, Purchase ~ ., data = train, kernel = "radial", ranges = list(cost = 10^seq(-2, 1, by = 0.5)))
summary(tune.cost)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.16 
## 
## - Detailed performance results:
##          cost   error dispersion
## 1  0.01000000 0.39000 0.03809710
## 2  0.03162278 0.33375 0.06562996
## 3  0.10000000 0.18500 0.04440971
## 4  0.31622777 0.17250 0.03670453
## 5  1.00000000 0.16000 0.03216710
## 6  3.16227766 0.17125 0.03729108
## 7 10.00000000 0.17625 0.02791978

The most optimal cost appears to be 1.

set.seed(100)
svm.radial <- svm(Purchase ~ ., kernel = "radial", data = train, cost = tune.cost$best.parameter$cost)
train.preds <- predict(svm.radial, train)
caret=confusionMatrix(as.factor(train.preds),as.factor(train$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 448  69
##         MM  40 243
##                                          
##                Accuracy : 0.8638         
##                  95% CI : (0.838, 0.8868)
##     No Information Rate : 0.61           
##     P-Value [Acc > NIR] : < 2e-16        
##                                          
##                   Kappa : 0.7088         
##                                          
##  Mcnemar's Test P-Value : 0.00732        
##                                          
##             Sensitivity : 0.9180         
##             Specificity : 0.7788         
##          Pos Pred Value : 0.8665         
##          Neg Pred Value : 0.8587         
##              Prevalence : 0.6100         
##          Detection Rate : 0.5600         
##    Detection Prevalence : 0.6462         
##       Balanced Accuracy : 0.8484         
##                                          
##        'Positive' Class : CH             
## 
1-0.8638
## [1] 0.1362
test.preds <- predict(svm.radial, test)
caret=confusionMatrix(as.factor(test.preds),as.factor(test$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 147  32
##         MM  18  73
##                                           
##                Accuracy : 0.8148          
##                  95% CI : (0.7633, 0.8593)
##     No Information Rate : 0.6111          
##     P-Value [Acc > NIR] : 4.049e-13       
##                                           
##                   Kappa : 0.6007          
##                                           
##  Mcnemar's Test P-Value : 0.06599         
##                                           
##             Sensitivity : 0.8909          
##             Specificity : 0.6952          
##          Pos Pred Value : 0.8212          
##          Neg Pred Value : 0.8022          
##              Prevalence : 0.6111          
##          Detection Rate : 0.5444          
##    Detection Prevalence : 0.6630          
##       Balanced Accuracy : 0.7931          
##                                           
##        'Positive' Class : CH              
## 
1-0.8148
## [1] 0.1852

Our training error is 0.1362 and our test error is 0.1852 using the radial kernal with a cost of 1.

g. Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set “degree” = 2.

set.seed(100)
svm.poly <- svm(Purchase ~ ., data = train, kernel = "polynomial", cost = 0.01)
summary(svm.poly)
## 
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "polynomial", 
##     cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.01 
##      degree:  3 
##      coef.0:  0 
## 
## Number of Support Vectors:  618
## 
##  ( 311 307 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

There are 618 support vectors. 311 belong to the CH class and 307 belong to the MM class.

train.preds <- predict(svm.poly, train)
caret=confusionMatrix(as.factor(train.preds),as.factor(train$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 483 294
##         MM   5  18
##                                           
##                Accuracy : 0.6262          
##                  95% CI : (0.5917, 0.6599)
##     No Information Rate : 0.61            
##     P-Value [Acc > NIR] : 0.1826          
##                                           
##                   Kappa : 0.057           
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.98975         
##             Specificity : 0.05769         
##          Pos Pred Value : 0.62162         
##          Neg Pred Value : 0.78261         
##              Prevalence : 0.61000         
##          Detection Rate : 0.60375         
##    Detection Prevalence : 0.97125         
##       Balanced Accuracy : 0.52372         
##                                           
##        'Positive' Class : CH              
## 
1-.6262
## [1] 0.3738
test.preds <- predict(svm.poly, test)
caret=confusionMatrix(as.factor(test.preds),as.factor(test$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 163  94
##         MM   2  11
##                                           
##                Accuracy : 0.6444          
##                  95% CI : (0.5842, 0.7015)
##     No Information Rate : 0.6111          
##     P-Value [Acc > NIR] : 0.1442          
##                                           
##                   Kappa : 0.1102          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.9879          
##             Specificity : 0.1048          
##          Pos Pred Value : 0.6342          
##          Neg Pred Value : 0.8462          
##              Prevalence : 0.6111          
##          Detection Rate : 0.6037          
##    Detection Prevalence : 0.9519          
##       Balanced Accuracy : 0.5463          
##                                           
##        'Positive' Class : CH              
## 
1-0.6444
## [1] 0.3556

Our training error is 0.3738 and our test error is 0.3556 for the polynomial kernel with cost equal to 0.1.

set.seed(100)
tune.cost <- tune(svm, Purchase ~ ., data = train, kernel = "polynomial", ranges = list(cost = 10^seq(-2, 1, by = 0.5)))
summary(tune.cost)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.175 
## 
## - Detailed performance results:
##          cost   error dispersion
## 1  0.01000000 0.37375 0.04387878
## 2  0.03162278 0.33875 0.04543387
## 3  0.10000000 0.28625 0.04226652
## 4  0.31622777 0.19625 0.04126894
## 5  1.00000000 0.18375 0.04210189
## 6  3.16227766 0.18000 0.04377975
## 7 10.00000000 0.17500 0.03333333

The most optimal cost appears to be 10.

set.seed(100)
svm.poly <- svm(Purchase ~ ., kernel = "polynomial", data = train, cost = tune.cost$best.parameter$cost)
train.preds <- predict(svm.poly, train)
caret=confusionMatrix(as.factor(train.preds),as.factor(train$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 452  68
##         MM  36 244
##                                           
##                Accuracy : 0.87            
##                  95% CI : (0.8447, 0.8925)
##     No Information Rate : 0.61            
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7216          
##                                           
##  Mcnemar's Test P-Value : 0.002367        
##                                           
##             Sensitivity : 0.9262          
##             Specificity : 0.7821          
##          Pos Pred Value : 0.8692          
##          Neg Pred Value : 0.8714          
##              Prevalence : 0.6100          
##          Detection Rate : 0.5650          
##    Detection Prevalence : 0.6500          
##       Balanced Accuracy : 0.8541          
##                                           
##        'Positive' Class : CH              
## 
1-0.87
## [1] 0.13
test.preds <- predict(svm.poly, test)
caret=confusionMatrix(as.factor(test.preds),as.factor(test$Purchase))
caret
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 144  35
##         MM  21  70
##                                           
##                Accuracy : 0.7926          
##                  95% CI : (0.7393, 0.8394)
##     No Information Rate : 0.6111          
##     P-Value [Acc > NIR] : 1.324e-10       
##                                           
##                   Kappa : 0.5528          
##                                           
##  Mcnemar's Test P-Value : 0.08235         
##                                           
##             Sensitivity : 0.8727          
##             Specificity : 0.6667          
##          Pos Pred Value : 0.8045          
##          Neg Pred Value : 0.7692          
##              Prevalence : 0.6111          
##          Detection Rate : 0.5333          
##    Detection Prevalence : 0.6630          
##       Balanced Accuracy : 0.7697          
##                                           
##        'Positive' Class : CH              
## 
1-0.7926
## [1] 0.2074

h. Overall, which approach seems to give the best results on this data ?

I would say that the linear kernel works the best on our data if we are prioritizing predicting test values accurately.