library(tidyverse)
library(caret)
library(e1071)
library(ISLR2)Support Vector Machines
5
We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
5A
Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them
x1 <- runif (500) - 0.5
x2 <- runif (500) - 0.5
y <- 1 * (x1^2 - x2^2 > 0)5B
Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y-axis.
df <- cbind(x1, x2, as.factor(y)) |>
as.data.frame()
colnames(df) <- c("x1","x2","y")
df$y <- as.factor(df$y)
df |> ggplot(aes(x = x1, y = x2, color = y)) +
geom_point()5C
Fit a logistic regression model to the data, using X1 and X2 as predictors.
mod.log <- glm(y ~ ., data = df, family = "binomial")5D
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
df$preds_log <- ifelse(predict(mod.log, type = "response") > 0.5, 2, 1) |>
as.factor()
df |> ggplot(aes(x = x1, y = x2, color = preds_log)) +
geom_point()confusionMatrix(df$y, df$preds_log)Confusion Matrix and Statistics
Reference
Prediction 1 2
1 145 105
2 105 145
Accuracy : 0.58
95% CI : (0.5354, 0.6237)
No Information Rate : 0.5
P-Value [Acc > NIR] : 0.0002002
Kappa : 0.16
Mcnemar's Test P-Value : 1.0000000
Sensitivity : 0.58
Specificity : 0.58
Pos Pred Value : 0.58
Neg Pred Value : 0.58
Prevalence : 0.50
Detection Rate : 0.29
Detection Prevalence : 0.50
Balanced Accuracy : 0.58
'Positive' Class : 1
We can see the basic logistic model doesn’t perform well on the data
5E
Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors.
Since we created the data, we know that there is a quadratic relationship.
mod.log2 <- glm(formula = y ~ x1 + x2 + I(x1^2) + I(x2^2), family = "binomial", data = df)Warning: glm.fit: algorithm did not converge
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
5F
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
df$preds_log2 <- ifelse(predict(mod.log2, type = "response") > 0.5, 2, 1) |>
as.factor()
df |> ggplot(aes(x = x1, y = x2, color = preds_log2)) +
geom_point()The quadratic terms resulted in a perfect representation of the data.
confusionMatrix(df$y, df$preds_log2)Confusion Matrix and Statistics
Reference
Prediction 1 2
1 250 0
2 0 250
Accuracy : 1
95% CI : (0.9926, 1)
No Information Rate : 0.5
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 1
Mcnemar's Test P-Value : NA
Sensitivity : 1.0
Specificity : 1.0
Pos Pred Value : 1.0
Neg Pred Value : 1.0
Prevalence : 0.5
Detection Rate : 0.5
Detection Prevalence : 0.5
Balanced Accuracy : 1.0
'Positive' Class : 1
5G
Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
mod.svm <- svm(y ~ x1 + x2, data = df, kernel = "linear", cost = 10, scale = FALSE)
df$preds_svm <- predict(mod.svm, type = "response") |>
as.factor()
df |> ggplot(aes(x = x1, y = x2, color = preds_svm)) +
geom_point()The linear kernal cannot accurately represent the data.
confusionMatrix(df$y, df$preds_svm)Confusion Matrix and Statistics
Reference
Prediction 1 2
1 166 84
2 111 139
Accuracy : 0.61
95% CI : (0.5657, 0.653)
No Information Rate : 0.554
P-Value [Acc > NIR] : 0.006483
Kappa : 0.22
Mcnemar's Test P-Value : 0.062617
Sensitivity : 0.5993
Specificity : 0.6233
Pos Pred Value : 0.6640
Neg Pred Value : 0.5560
Prevalence : 0.5540
Detection Rate : 0.3320
Detection Prevalence : 0.5000
Balanced Accuracy : 0.6113
'Positive' Class : 1
5H
Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
set.seed(1111)
idx <- sample(500, 400, replace = F)
tune.out <- tune(svm , y ~ x1 + x2, data = df[idx, ],
kernel = "radial",
ranges = list(
cost = c(0.1, 1, 10, 100, 1000),
gamma = c(0.5, 1, 2, 3, 4)
)
)table(true = df[-idx, "y"],
pred = predict(tune.out$best.model , newdata = df[-idx, ]
)
) pred
true 1 2
1 58 0
2 3 39
This svm model results in a test error rate of 3%
5I
It is clear that regardless of the model used, understanding the relationship between the predictors and the response variable are crucial to constructing a high performing model.
7
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
data("Auto")
df <- Auto
df <- df |>
select(-name) |>
mutate(origin = as.factor(origin))7A
df <- df |>
mutate(response = as.factor(ifelse(mpg > median(mpg),1,0))) |>
select(-mpg)7B
idx <- sample(392, 300, replace = F)
svm.linear <- tune(svm,
response ~ .,
data = df[idx,],
kernel = "linear",
ranges = list(
cost = c(0.1, 1, 10, 100, 1000)
)
)svm.linear$performances |>
mutate(cost = as.factor(cost)) |>
ggplot(aes(x=cost,y=error)) +
geom_col()The lowest error occurs at cost = 1e-01
svm.linear$best.parameters cost
3 10
7C
svm.radial <- tune(svm,
response ~ .,
data = df[idx,],
kernel = "radial",
ranges = list(
cost = c(0.1, 1, 10, 100, 1000),
gamma = c(0.5, 1, 2, 3, 4)
)
)svm.poly <- tune(svm,
response ~ .,
data = df[idx,],
kernel = "polynomial",
ranges = list(
cost = c(0.1, 1, 10, 100, 1000),
gamma = c(0.5, 1, 2, 3, 4),
degree = c(1,2,3,4)
)
)7D
svm.radial$performances |>
mutate(cost = as.factor(cost)) |>
ggplot(aes(x=cost,y=error)) +
geom_col()svm.radial$best.parameters cost gamma
7 1 1
svm.poly$performances |>
mutate(cost = as.factor(cost)) |>
ggplot(aes(x=cost,y=error)) +
geom_col()svm.poly$best.parameters cost gamma degree
4 100 0.5 1
8
This problem involves the OJ data set which is part of the ISLR2 package.
df <- data("OJ")8A
Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
idx <- sample(nrow(OJ),800,replace=F)
train <- OJ[idx,]
test <- OJ[-idx,]8B
Fit a support vector classifier to the training data using cost = 0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
svm.linear <- svm(Purchase ~ .,
data = train,
cost = .01,
kernel = "linear")summary(svm.linear)
Call:
svm(formula = Purchase ~ ., data = train, cost = 0.01, kernel = "linear")
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 0.01
Number of Support Vectors: 442
( 221 221 )
Number of Classes: 2
Levels:
CH MM
8C
What are the training and test error rates?
data.frame(train_error = mean(predict(svm.linear, train) != train$Purchase),
test_error = mean(predict(svm.linear, test) != test$Purchase)) train_error test_error
1 0.1725 0.1666667
Training error is 15% and the test error is 19%
8D
Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
svm.linear <- tune(svm,
Purchase ~ .,
data = train,
kernel = "linear",
ranges = list(
cost = c(.01, .1, 1, 10)
)
)svm.linear$best.parameters cost
4 10
Cost = .1 is the best value.
8E
Compute the training and test error rates using this new value for cost.
data.frame(train_error = mean(predict(svm.linear$best.model, train) != train$Purchase),
test_error = mean(predict(svm.linear$best.model, test) != test$Purchase)) train_error test_error
1 0.16625 0.1481481
The train and test error increases to 15 and 20%
8F
Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
svm.radial <- svm(Purchase ~ .,
data = train,
cost = .01,
kernel = "radial")summary(svm.radial)
Call:
svm(formula = Purchase ~ ., data = train, cost = 0.01, kernel = "radial")
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 0.01
Number of Support Vectors: 616
( 309 307 )
Number of Classes: 2
Levels:
CH MM
data.frame(train_error = mean(predict(svm.radial, train) != train$Purchase),
test_error = mean(predict(svm.radial, test) != test$Purchase)) train_error test_error
1 0.38375 0.4074074
svm.radial <- tune(svm,
Purchase ~ .,
data = train,
kernel = "radial",
ranges = list(
cost = c(.01, .1, 1, 10)
)
)svm.radial$best.parameters cost
4 10
data.frame(train_error = mean(predict(svm.radial$best.model, train) != train$Purchase),
test_error = mean(predict(svm.radial$best.model, test) != test$Purchase)) train_error test_error
1 0.14125 0.1777778
8G
Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree = 2.
svm.poly <- svm(Purchase ~ .,
data = train,
cost = .01,
kernel = "polynomial",
degree = 2)summary(svm.poly)
Call:
svm(formula = Purchase ~ ., data = train, cost = 0.01, kernel = "polynomial",
degree = 2)
Parameters:
SVM-Type: C-classification
SVM-Kernel: polynomial
cost: 0.01
degree: 2
coef.0: 0
Number of Support Vectors: 621
( 314 307 )
Number of Classes: 2
Levels:
CH MM
data.frame(train_error = mean(predict(svm.poly, train) != train$Purchase),
test_error = mean(predict(svm.poly, test) != test$Purchase)) train_error test_error
1 0.38375 0.4074074
svm.poly <- tune(svm,
Purchase ~ .,
data = train,
kernel = "polynomial",
degree = 2,
ranges = list(
cost = c(.01, .1, 1, 10)
)
)svm.poly$best.parameters cost
4 10
data.frame(train_error = mean(predict(svm.poly$best.model, train) != train$Purchase),
test_error = mean(predict(svm.poly$best.model, test) != test$Purchase)) train_error test_error
1 0.15625 0.1740741
8H
Overall, which approach seems to give the best results on this data?
Overall, the radial and polynomial performed similarly. However, the radial performed slightly better.