5. We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them.
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
## Registered S3 method overwritten by 'rvest':
## method from
## read_xml.response xml2
## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1 v purrr 0.3.2
## v tibble 2.1.1 v dplyr 0.8.0.1
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggthemes)
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
library(e1071)
theme_set(theme_tufte(base_size = 14))
df <- data.frame(replicate(2, rnorm(500)))
df <- as.tibble(df)
## Warning: `as.tibble()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
class_func <- function(x, y) {
x^2 + y < 1
}
df <- df %>%
rename(Var1 = X1, Var2 = X2) %>%
mutate(Class = ifelse(class_func(Var1, Var2),
'Class A',
'Class B'),
Class = factor(Class))
b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.
inTrain <- sample(nrow(df), nrow(df)*0.6, replace = FALSE)
training <- df[inTrain,]
testing <- df[-inTrain,]
ggplot(df, aes(Var1, Var2, col = Class)) +
geom_point(size = 2)

c) Fit a logistic regression model to the data, using X1 and X2 as predictors.
logreg_fit <- glm(Class ~ ., data = training, family = 'binomial')
summary(logreg_fit)
##
## Call:
## glm(formula = Class ~ ., family = "binomial", data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8249 -0.7919 -0.2596 0.6825 2.4252
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4213 0.1505 -2.800 0.00512 **
## Var1 -0.1080 0.1429 -0.756 0.44992
## Var2 2.0213 0.2400 8.421 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 412.47 on 299 degrees of freedom
## Residual deviance: 279.31 on 297 degrees of freedom
## AIC: 285.31
##
## Number of Fisher Scoring iterations: 5
d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
pred <- predict(logreg_fit, testing, type = 'response')
pred <- ifelse(pred >= 0.5, 'Class B', 'Class A')
mean(pred == testing$Class)
## [1] 0.73
e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors.
logreg_fit <- glm(Class ~ poly(Var1, 2) + Var2, data = training, family = 'binomial')
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(logreg_fit)
##
## Call:
## glm(formula = Class ~ poly(Var1, 2) + Var2, family = "binomial",
## data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.848e-04 -2.000e-08 -2.000e-08 2.000e-08 9.595e-04
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 64.51 5701.25 0.011 0.991
## poly(Var1, 2)1 -2363.93 127037.62 -0.019 0.985
## poly(Var1, 2)2 25924.46 1078037.74 0.024 0.981
## Var2 912.27 37674.57 0.024 0.981
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4.1247e+02 on 299 degrees of freedom
## Residual deviance: 1.9387e-06 on 296 degrees of freedom
## AIC: 8
##
## Number of Fisher Scoring iterations: 25
f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear
pred <- predict(logreg_fit, testing, type = 'response')
pred <- ifelse(pred >= 0.5, 'Class B', 'Class A')
mean(pred == testing$Class)
## [1] 0.995
data_frame(pred = pred, class = testing$Class) %>%
ggplot(aes(pred, class)) +
geom_jitter()
## Warning: `data_frame()` is deprecated, use `tibble()`.
## This warning is displayed once per session.

g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm_fit <- svm(Class ~ ., data = training,
kernel = 'linear',
scale = FALSE)
plot(svm_fit, testing)

mean(predict(svm_fit, testing) == testing$Class)
## [1] 0.745
h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm_poly <- svm(Class ~ ., data = training,
kernel = 'polynomial', degree = 2,
scale = FALSE)
plot(svm_poly, testing)

mean(predict(svm_poly, testing) == testing$Class)
## [1] 0.79
svm_radial <- svm(Class ~ ., data = training,
kernel = 'radial',
scale = FALSE)
plot(svm_radial, testing)

mean(predict(svm_radial, testing) == testing$Class)
## [1] 0.96
i) Comment on your results: There are strong similarities between Support Vector Machines and Logistic Regression. The best model seemed to be a regression model with the right covariates.
d) Make some plots to back up your assertions in (b) and (c).
plot(svm_linear)

plot(svm_poly)

plot(svm_radial)

postResample(predict(svm_linear), Auto$highmpg)
## Accuracy Kappa
## 0.9285714 0.8571429
postResample(predict(svm_poly), Auto$highmpg)
## Accuracy Kappa
## 0.9540816 0.9081633
postResample(predict(svm_radial), Auto$highmpg)
## Accuracy Kappa
## 0.9744898 0.9489796
8. This problem involves the OJ data set which is part of the ISLR package.
a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
set.seed(1)
data('OJ')
inTrain <- sample(nrow(OJ), 800, replace = FALSE)
training <- OJ[inTrain,]
testing <- OJ[-inTrain,]
b) Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
svm_linear <- svm(Purchase ~ ., data = training,
kernel = 'linear',
cost = 0.01)
summary(svm_linear)
##
## Call:
## svm(formula = Purchase ~ ., data = training, kernel = "linear",
## cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
## gamma: 0.05555556
##
## Number of Support Vectors: 435
##
## ( 219 216 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
c) What are the training and test error rates?
postResample(predict(svm_linear, training), training$Purchase)
## Accuracy Kappa
## 0.8250000 0.6313971
postResample(predict(svm_linear, testing), testing$Purchase)
## Accuracy Kappa
## 0.8222222 0.6082699
d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
svm_linear_tune <- train(Purchase ~ ., data = training,
method = 'svmLinear2',
trControl = trainControl(method = 'cv', number = 10),
preProcess = c('center', 'scale'),
tuneGrid = expand.grid(cost = seq(0.01, 10, length.out = 20)))
svm_linear_tune
## Support Vector Machines with Linear Kernel
##
## 800 samples
## 17 predictor
## 2 classes: 'CH', 'MM'
##
## Pre-processing: centered (17), scaled (17)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 721, 720, 720, 720, 721, 719, ...
## Resampling results across tuning parameters:
##
## cost Accuracy Kappa
## 0.0100000 0.8199215 0.6202565
## 0.5357895 0.8273760 0.6360834
## 1.0615789 0.8236101 0.6284665
## 1.5873684 0.8261105 0.6333280
## 2.1131579 0.8261105 0.6333280
## 2.6389474 0.8273605 0.6362121
## 3.1647368 0.8261105 0.6338114
## 3.6905263 0.8248605 0.6309732
## 4.2163158 0.8248605 0.6309732
## 4.7421053 0.8261105 0.6338114
## 5.2678947 0.8273605 0.6361662
## 5.7936842 0.8273605 0.6361662
## 6.3194737 0.8260947 0.6331693
## 6.8452632 0.8260947 0.6331693
## 7.3710526 0.8260947 0.6331693
## 7.8968421 0.8273605 0.6361662
## 8.4226316 0.8273605 0.6361662
## 8.9484211 0.8273605 0.6361662
## 9.4742105 0.8248447 0.6308145
## 10.0000000 0.8248447 0.6308145
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cost = 0.5357895.
e) Compute the training and test error rates using this new value for cost.
postResample(predict(svm_linear_tune, training), training$Purchase)
## Accuracy Kappa
## 0.8350000 0.6524601
postResample(predict(svm_linear_tune, testing), testing$Purchase)
## Accuracy Kappa
## 0.8444444 0.6585983
f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
svm_radial <- svm(Purchase ~ ., data = training,
method = 'radial',
cost = 0.01)
summary(svm_radial)
##
## Call:
## svm(formula = Purchase ~ ., data = training, method = "radial",
## cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.01
## gamma: 0.05555556
##
## Number of Support Vectors: 634
##
## ( 319 315 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
postResample(predict(svm_radial, training), training$Purchase)
## Accuracy Kappa
## 0.60625 0.00000
postResample(predict(svm_radial, testing), testing$Purchase)
## Accuracy Kappa
## 0.6222222 0.0000000
svm_radial_tune <- train(Purchase ~ ., data = training,
method = 'svmRadial',
trControl = trainControl(method = 'cv', number = 10),
preProcess = c('center', 'scale'),
tuneGrid = expand.grid(C = seq(0.01, 10, length.out = 20),
sigma = 0.05691))
svm_radial_tune
## Support Vector Machines with Radial Basis Function Kernel
##
## 800 samples
## 17 predictor
## 2 classes: 'CH', 'MM'
##
## Pre-processing: centered (17), scaled (17)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 720, 719, 719, 721, 719, 720, ...
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.0100000 0.6062600 0.0000000
## 0.5357895 0.8274369 0.6315267
## 1.0615789 0.8249527 0.6267051
## 1.5873684 0.8199986 0.6165675
## 2.1131579 0.8174982 0.6105624
## 2.6389474 0.8149824 0.6041027
## 3.1647368 0.8112166 0.5964807
## 3.6905263 0.8112166 0.5964807
## 4.2163158 0.8124512 0.5993391
## 4.7421053 0.8137170 0.6021336
## 5.2678947 0.8137174 0.6017074
## 5.7936842 0.8137174 0.6017074
## 6.3194737 0.8124828 0.5988491
## 6.8452632 0.8124828 0.5988491
## 7.3710526 0.8137641 0.6020343
## 7.8968421 0.8112324 0.5967764
## 8.4226316 0.8112324 0.5967764
## 8.9484211 0.8099666 0.5939493
## 9.4742105 0.8124982 0.5992398
## 10.0000000 0.8124982 0.5992398
##
## Tuning parameter 'sigma' was held constant at a value of 0.05691
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.05691 and C = 0.5357895.
postResample(predict(svm_radial_tune, training), training$Purchase)
## Accuracy Kappa
## 0.851250 0.684392
postResample(predict(svm_radial_tune, testing), testing$Purchase)
## Accuracy Kappa
## 0.8185185 0.6040582
g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
svm_poly <- svm(Purchase ~ ., data = training,
method = 'polynomial', degree = 2,
cost = 0.01)
summary(svm_poly)
##
## Call:
## svm(formula = Purchase ~ ., data = training, method = "polynomial",
## degree = 2, cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.01
## gamma: 0.05555556
##
## Number of Support Vectors: 634
##
## ( 319 315 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
postResample(predict(svm_poly, training), training$Purchase)
## Accuracy Kappa
## 0.60625 0.00000
postResample(predict(svm_poly, testing), testing$Purchase)
## Accuracy Kappa
## 0.6222222 0.0000000
h) Overall, which approach seems to give the best results on this data?