We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them.
set.seed(1)
# Generate n=500 and p =2
X1 <- runif(500, -0.5, 0.5)
X2 <- runif(500, -0.5, 0.5)
# Quadratic
ClassLabel <- as.factor(ifelse(X1^2 - X2^2 > 0, 1, 0))
# Assemble into a data frame
DataSet <- data.frame(X1 = X1, X2 = X2, ClassLabel = ClassLabel)
Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y-axis.
library(ggplot2)
# Plot the observation
ggplot(DataSet, aes(x = X1, y = X2, color = ClassLabel)) +
geom_point() +
labs(
x = "X1",
y = "X2"
) +
theme_minimal()
Fit a logistic regression model to the data, using X1 and X2 as predictors.
# Fit a linear logistic regression
LogisticModelLinear <- glm(ClassLabel ~ X1 + X2, data = DataSet, family = binomial)
summary(LogisticModelLinear)
##
## Call:
## glm(formula = ClassLabel ~ X1 + X2, family = binomial, data = DataSet)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.087260 0.089579 -0.974 0.330
## X1 0.196199 0.316864 0.619 0.536
## X2 -0.002854 0.305712 -0.009 0.993
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.18 on 499 degrees of freedom
## Residual deviance: 691.79 on 497 degrees of freedom
## AIC: 697.79
##
## Number of Fisher Scoring iterations: 3
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
# Compute predicted probabilities and convert to class labels
DataSet$PredProbLinear <- predict(LogisticModelLinear, type = "response")
DataSet$PredClassLinear <- as.factor(ifelse(DataSet$PredProbLinear > 0.5, 1, 0))
# Plot predicted classes
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassLinear)) +
geom_point() +
labs(
title = "Predicted Classes (Linear Logistic)",
x = "X1",
y = "X2"
) +
theme_minimal()
Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X21 , X1 ×X2, log(X2), and so forth).
# alternative with log, shift X2 up by 0.51 so it's >0
LogisticModelNonLinear2 <- glm(
ClassLabel ~ I(X1^2) + I(X1 * X2) + I(log(X2+0.5)),
data = DataSet,
family = binomial
)
summary(LogisticModelNonLinear2)
##
## Call:
## glm(formula = ClassLabel ~ I(X1^2) + I(X1 * X2) + I(log(X2 +
## 0.5)), family = binomial, data = DataSet)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2832 0.1991 -6.445 1.16e-10 ***
## I(X1^2) 24.7078 2.2214 11.122 < 2e-16 ***
## I(X1 * X2) -0.2343 1.5664 -0.150 0.881
## I(log(X2 + 0.5)) 0.6839 0.1380 4.956 7.19e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.18 on 499 degrees of freedom
## Residual deviance: 452.45 on 496 degrees of freedom
## AIC: 460.45
##
## Number of Fisher Scoring iterations: 5
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not,then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
# Compute predicted probabilities and class labels
DataSet$PredProbNonLin <- predict(LogisticModelNonLinear2, type = "response")
DataSet$PredClassNonLin <- as.factor(ifelse(DataSet$PredProbNonLin > 0.5, 1, 0))
# Plot the new predicted classes
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassNonLin)) +
geom_point() +
labs(
x = "X1",
y = "X2"
) +
theme_minimal()
Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
library(e1071)
library(ggplot2)
# Fit SVM with a linear kernel
SVMModelLinear <- svm(
ClassLabel ~ X1 + X2,
data = DataSet,
#kernel = "linear",
kernel = "radial",
cost = 10
)
# Predict on the training data
DataSet$PredClassSVMLinear <- predict(SVMModelLinear, newdata = DataSet)
summary(DataSet$PredClassSVMLinear)
## 0 1
## 259 241
# Plot the linear SVM decision regions
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassSVMLinear)) +
geom_point() +
labs(
x = "X1",
y = "X2"
) +
theme_minimal()
Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
# Fit SVM with a radial (Gaussian) kernel
SVMModelRadial <- svm(
ClassLabel ~ X1 + X2,
data = DataSet,
kernel = "radial",
gamma = 1, # width parameter for Gaussian kernel
cost = 1
)
# Predict on the training data
DataSet$PredClassSVMRadial <- predict(SVMModelRadial, newdata = DataSet)
summary(DataSet$PredClassSVMRadial)
## 0 1
## 269 231
# Plot the radial SVM decision regions
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassSVMRadial)) +
geom_point() +
labs(
x = "X1",
y = "X2"
) +
theme_minimal()
Answer:Linear models misclassify points near the curved boundary. non-linear logistic regression and radial SVM are more accurate.
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
library(ISLR2)
library(e1071)
data("Auto", package = "ISLR2")
AutoData <- na.omit(Auto)
# Create a binary variable
MedianMPG <- median(AutoData$mpg)
AutoData$HighMPG <- as.factor(ifelse(AutoData$mpg > MedianMPG, 1, 0))
# Remove original mpg variable
AutoData$mpg <- NULL
head(AutoData)
## cylinders displacement horsepower weight acceleration year origin
## 1 8 307 130 3504 12.0 70 1
## 2 8 350 165 3693 11.5 70 1
## 3 8 318 150 3436 11.0 70 1
## 4 8 304 150 3433 12.0 70 1
## 5 8 302 140 3449 10.5 70 1
## 6 8 429 198 4341 10.0 70 1
## name HighMPG
## 1 chevrolet chevelle malibu 0
## 2 buick skylark 320 0
## 3 plymouth satellite 0
## 4 amc rebel sst 0
## 5 ford torino 0
## 6 ford galaxie 500 0
Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results. Note you will need to fit the classifier without the gas mileage variable to produce sensible results.
Answer: The lowest Cross-Validation error occurs at cost = 0.1.
set.seed(1)
LinearTune <- tune(svm, HighMPG ~ ., data = AutoData, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 10, 100)))
# Summary cross-validation results
summary(LinearTune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.1
##
## - best performance: 0.08673077
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-02 0.08923077 0.04698309
## 2 1e-01 0.08673077 0.04040897
## 3 1e+00 0.09961538 0.04923181
## 4 1e+01 0.11237179 0.05701890
## 5 1e+02 0.11750000 0.06208951
Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.
Answer: The radial kernel achieves the lowest error, while the polynomial kernel performs best at degree = 3, cost = 10.
set.seed(1)
# SVM with radial
RadialTune <- tune(svm, HighMPG ~ ., data = AutoData, kernel = "radial", ranges = list(cost = c(0.1, 1, 10), gamma = c(0.01, 0.1, 1)))
#summary
summary(RadialTune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 10 1
##
## - best performance: 0.07897436
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 0.1 0.01 0.11224359 0.03836937
## 2 1.0 0.01 0.08673077 0.04551036
## 3 10.0 0.01 0.08673077 0.03855882
## 4 0.1 0.10 0.08666667 0.04193895
## 5 1.0 0.10 0.08923077 0.04376306
## 6 10.0 0.10 0.08416667 0.05256241
## 7 0.1 1.00 0.55115385 0.04366593
## 8 1.0 1.00 0.07903846 0.04891067
## 9 10.0 1.00 0.07897436 0.04869339
set.seed(1)
# SVM with Polynomial
PolyTune <- tune(svm, HighMPG ~ ., data = AutoData, kernel = "polynomial", ranges = list(cost = c(0.1, 1, 10), degree = c(2, 3, 4)))
#Summary
summary(PolyTune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 10 2
##
## - best performance: 0.520641
##
## - Detailed performance results:
## cost degree error dispersion
## 1 0.1 2 0.5511538 0.04366593
## 2 1.0 2 0.5511538 0.04366593
## 3 10.0 2 0.5206410 0.08505283
## 4 0.1 3 0.5511538 0.04366593
## 5 1.0 3 0.5511538 0.04366593
## 6 10.0 3 0.5511538 0.04366593
## 7 0.1 4 0.5511538 0.04366593
## 8 1.0 4 0.5511538 0.04366593
## 9 10.0 4 0.5511538 0.04366593
library(ggplot2)
# Extract best models
BestLinearModel <- LinearTune$best.model
BestRadialModel <- RadialTune$best.model
# Plot linear SVM on Horsepower and Weight
plot(BestLinearModel, AutoData, horsepower ~ weight, main = "Linear SVM")
# Plot radial SVM on Weight and Displacement
plot(BestRadialModel, AutoData, weight ~ displacement, main = "Radial SVM")
This problem involves the OJ data set which is part of the ISLR2 package.
Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
library(ISLR2)
library(e1071)
set.seed(1)
TotalRows <- nrow(OJ)
TrainIndices <- sample(seq_len(TotalRows), size = 800)
OJTrain <- OJ[TrainIndices, ] # 800 samples for train
OJTest <- OJ[-TrainIndices, ]
head(OJTrain)
## Purchase WeekofPurchase StoreID PriceCH PriceMM DiscCH DiscMM SpecialCH
## 1017 CH 236 7 1.75 1.99 0.00 0.40 0
## 679 CH 277 1 1.99 2.13 0.24 0.00 0
## 129 CH 267 2 1.86 2.18 0.00 0.40 0
## 930 MM 236 3 1.79 2.09 0.00 0.00 0
## 471 CH 274 7 1.86 2.13 0.47 0.54 1
## 299 MM 231 2 1.69 1.69 0.30 0.00 1
## SpecialMM LoyalCH SalePriceMM SalePriceCH PriceDiff Store7 PctDiscMM
## 1017 0 0.908732 1.59 1.75 -0.16 Yes 0.201005
## 679 0 0.585435 2.13 1.75 0.38 No 0.000000
## 129 1 0.890772 1.78 1.86 -0.08 No 0.183486
## 930 0 0.163840 2.09 1.79 0.30 No 0.000000
## 471 0 0.500000 1.59 1.39 0.20 Yes 0.253521
## 299 0 0.467200 1.69 1.39 0.30 No 0.000000
## PctDiscCH ListPriceDiff STORE
## 1017 0.000000 0.24 0
## 679 0.120603 0.14 1
## 129 0.000000 0.32 2
## 930 0.000000 0.30 3
## 471 0.252688 0.27 0
## 299 0.177515 0.00 2
Fit a support vector classifier to the training data using cost = 0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
There are pretty large number of Suspport Vectors(=435), it could be very flexible.
# SVM with cost = 0.01
SVMLinearInitial <- svm(Purchase ~ ., data = OJTrain, kernel = "linear", cost = 0.01)
# Summary
summary(SVMLinearInitial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJTrain, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 435
##
## ( 219 216 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
What are the training and test error rates?
# Predictions on training set
PredTrainLinearInit <- predict(SVMLinearInitial, newdata = OJTrain)
TrainErrorLinearInit <- mean(PredTrainLinearInit != OJTrain$Purchase)
# Predictions on test set
PredTestLinearInit <- predict(SVMLinearInitial, newdata = OJTest)
TestErrorLinearInit <- mean(PredTestLinearInit != OJTest$Purchase)
cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorLinearInit))
##
## >> Training Error Rate: 0.175000
cat(sprintf("\n>> Test Error Rate: %f\n\n", TestErrorLinearInit))
##
## >> Test Error Rate: 0.177778
Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
# Cross‐validation
LinearTune <- tune(svm, Purchase ~ ., data = OJTrain, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 10)))
# Best cost
BestCostLinear <- LinearTune$best.parameters$cost
BestCostLinear
## [1] 10
Compute the training and test error rates using this new value for cost.
# fit with new value
SVMLinearTuned <- LinearTune$best.model
# Compute errors
TrainErrorLinearTuned <- mean(predict(SVMLinearTuned, OJTrain) != OJTrain$Purchase)
TestErrorLinearTuned <- mean(predict(SVMLinearTuned, OJTest) != OJTest$Purchase)
cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorLinearTuned))
##
## >> Training Error Rate: 0.163750
cat(sprintf("\n>> Test Error Rate: %f\n\n", TestErrorLinearTuned))
##
## >> Test Error Rate: 0.148148
Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
# Init radial SVM (cost = 0.01, default gamma)
SVMRadialInitial <- svm(Purchase ~ ., data = OJTrain, kernel = "radial", cost = 0.01)
#Summary
summary(SVMRadialInitial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJTrain, kernel = "radial", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.01
##
## Number of Support Vectors: 634
##
## ( 319 315 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
# Init errors
TrainErrorRadialInit <- mean(predict(SVMRadialInitial, OJTrain) != OJTrain$Purchase)
TestErrorRadialInit <- mean(predict(SVMRadialInitial, OJTest) != OJTest$Purchase)
cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorRadialInit))
##
## >> Training Error Rate: 0.393750
cat(sprintf("\n>> Test Error Rate: %f\n\n", TestErrorRadialInit))
##
## >> Test Error Rate: 0.377778
# Tune cost for radial kernel
RadialTune <- tune(svm, Purchase ~ ., data = OJTrain, kernel = "radial", ranges = list(cost = c(0.01, 0.1, 1, 10)))
BestCostRadial <- RadialTune$best.parameters$cost
# Errors with tuned radial SVM
SVMRadialTuned <- RadialTune$best.model
TrainErrorRadialTuned <- mean(predict(SVMRadialTuned, OJTrain) != OJTrain$Purchase)
TestErrorRadialTuned <- mean(predict(SVMRadialTuned, OJTest) != OJTest$Purchase)
cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorRadialTuned))
##
## >> Training Error Rate: 0.151250
cat(sprintf("\n>> Test Error Rate: %f\n\n", TestErrorRadialTuned))
##
## >> Test Error Rate: 0.185185
Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree = 2.
# Init polynomial SVM (degree = 2)
SVMPolyInitial <- svm(Purchase ~ ., data = OJTrain, kernel = "polynomial", degree = 2, cost = 0.01)
summary(SVMPolyInitial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJTrain, kernel = "polynomial",
## degree = 2, cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.01
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 636
##
## ( 321 315 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
# Init errors
TrainErrorPolyInit <- mean(predict(SVMPolyInitial, OJTrain) != OJTrain$Purchase)
TestErrorPolyInit <- mean(predict(SVMPolyInitial, OJTest) != OJTest$Purchase)
cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorPolyInit))
##
## >> Training Error Rate: 0.372500
cat(sprintf("\n>> Test Error Rate: %f\n\n", TestErrorPolyInit))
##
## >> Test Error Rate: 0.366667
# Tune cost for polynomial kernel
PolyTune <- tune(svm, Purchase ~ ., data = OJTrain, kernel = "polynomial", degree = 2,ranges = list(cost = c(0.01, 0.1, 1, 10)))
BestCostPoly <- PolyTune$best.parameters$cost
# Errors with tuned polynomial SVM
SVMPolyTuned <- PolyTune$best.model
TrainErrorPolyTuned <- mean(predict(SVMPolyTuned, OJTrain) != OJTrain$Purchase)
TestErrorPolyTuned <- mean(predict(SVMPolyTuned, OJTest) != OJTest$Purchase)
cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorPolyTuned))
##
## >> Training Error Rate: 0.150000
cat(sprintf("\n>> Test Error Rate: %f\n\n", TestErrorPolyTuned))
##
## >> Test Error Rate: 0.188889
Overall, which approach seems to give the best results on this data?
Answer: The radial‐kernel SVM with tuned cost yields the lowest test error, so it performs best on this data