We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
set.seed(1)
x1=runif(500)-0.5
x2=runif(500)-0.5
y=1*(x1^2 - x2^2 > 0)
library(e1071)
## Warning: package 'e1071' was built under R version 4.1.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
ggplot() +
geom_point(aes(x = x1, y = x2, colour = as.factor(y))) +
theme_minimal() +
labs(colour = "Class")
logistic_model <- glm(as.factor(y) ~ x1 + x2, family = "binomial")
summary(logistic_model)
##
## Call:
## glm(formula = as.factor(y) ~ x1 + x2, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.179 -1.139 -1.112 1.206 1.257
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.087260 0.089579 -0.974 0.330
## x1 0.196199 0.316864 0.619 0.536
## x2 -0.002854 0.305712 -0.009 0.993
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.18 on 499 degrees of freedom
## Residual deviance: 691.79 on 497 degrees of freedom
## AIC: 697.79
##
## Number of Fisher Scoring iterations: 3
None of the variables are statistically significant.
model_probs <- predict(logistic_model,
newdata = data.frame(x1 = x1, x2 = x2, y = as.factor(y)),
type = "response")
model_preds <- ifelse(model_probs > 0.5, 1, 0)
ggplot() +
geom_point(aes(x = x1, y = x2, colour = as.factor(model_preds))) +
theme_minimal() +
labs(colour = "Predictions")
non_linear_model <- glm(as.factor(y) ~ I(x1^2) + x2:x1 + log(abs(x2)), family = "binomial")
summary(non_linear_model)
##
## Call:
## glm(formula = as.factor(y) ~ I(x1^2) + x2:x1 + log(abs(x2)),
## family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4319 -0.2839 -0.0653 0.2232 1.8266
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.0227 0.8797 -10.257 <2e-16 ***
## I(x1^2) 42.2778 4.2141 10.033 <2e-16 ***
## log(abs(x2)) -3.5792 0.3861 -9.271 <2e-16 ***
## x2:x1 0.1549 1.7912 0.087 0.931
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.18 on 499 degrees of freedom
## Residual deviance: 240.46 on 496 degrees of freedom
## AIC: 248.46
##
## Number of Fisher Scoring iterations: 7
none of the variables are statistically significant.
non_linear_probs <- predict(non_linear_model,
newdata = data.frame(x1 = x1, x2 = x2, y = as.factor(y)),
type = "response")
non_linear_preds <- ifelse(non_linear_probs > 0.5, 1, 0)
ggplot() +
geom_point(aes(x = x1, y = x2, colour = as.factor(non_linear_preds))) +
theme_minimal() +
labs(colour = "Predictions")
(g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
```r
svm_radial <- svm(y ~ .,
data.frame(x1 = x1, x2 = x2, y = as.factor(y)))
svm_radial_preds <- predict(svm_radial, data.frame(x1 = x1, x2 = x2, y = as.factor(y)))
ggplot() +
geom_point(aes(x = x1, y = x2, colour = as.factor(svm_radial_preds))) +
theme_minimal() +
labs(colour = "Predictions")
The worst performing classifier was the support vector classifier, which simply classified all observations to a single class. The logistic regression without transformed predictors fitted a linear decision boundary which bears no resemblance to the underlying data. The SVM with a radial kernel appears to model the actual decision boundary quite well, as does the logistic regresion with transformed variables; where the logistic regression was unable to model the point where the classes ‘cross’ in the middle of the distribution, the SVM was able to, albeit not very accurately.
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
library(MASS)
library(e1071)
library(ISLR)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
attach(Auto)
## The following object is masked from package:ggplot2:
##
## mpg
library(ISLR)
var <- ifelse(Auto$mpg > median(Auto$mpg), 1, 0)
Auto$mpglevel <- as.factor(var)
A cost of 1 seems to perform best
set.seed(1)
tune.out <- tune(svm, mpglevel ~ ., data = Auto, kernel = "polynomial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100), degree = c(2, 3, 4)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 100 2
##
## - best performance: 0.3013462
##
## - Detailed performance results:
## cost degree error dispersion
## 1 1e-02 2 0.5511538 0.04366593
## 2 1e-01 2 0.5511538 0.04366593
## 3 1e+00 2 0.5511538 0.04366593
## 4 5e+00 2 0.5511538 0.04366593
## 5 1e+01 2 0.5130128 0.08963366
## 6 1e+02 2 0.3013462 0.09961961
## 7 1e-02 3 0.5511538 0.04366593
## 8 1e-01 3 0.5511538 0.04366593
## 9 1e+00 3 0.5511538 0.04366593
## 10 5e+00 3 0.5511538 0.04366593
## 11 1e+01 3 0.5511538 0.04366593
## 12 1e+02 3 0.3446154 0.09821588
## 13 1e-02 4 0.5511538 0.04366593
## 14 1e-01 4 0.5511538 0.04366593
## 15 1e+00 4 0.5511538 0.04366593
## 16 5e+00 4 0.5511538 0.04366593
## 17 1e+01 4 0.5511538 0.04366593
## 18 1e+02 4 0.5511538 0.04366593
For a polynomial kernel, the lowest cross-validation error is obtained for a degree of 2 and a cost of 100
set.seed(1)
tune.out <- tune(svm, mpglevel ~ ., data = Auto, kernel = "radial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100), gamma = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 100 0.01
##
## - best performance: 0.01282051
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-02 1e-02 0.55115385 0.04366593
## 2 1e-01 1e-02 0.08929487 0.04382379
## 3 1e+00 1e-02 0.07403846 0.03522110
## 4 5e+00 1e-02 0.04852564 0.03303346
## 5 1e+01 1e-02 0.02557692 0.02093679
## 6 1e+02 1e-02 0.01282051 0.01813094
## 7 1e-02 1e-01 0.21711538 0.09865227
## 8 1e-01 1e-01 0.07903846 0.03874545
## 9 1e+00 1e-01 0.05371795 0.03525162
## 10 5e+00 1e-01 0.02820513 0.03299190
## 11 1e+01 1e-01 0.03076923 0.03375798
## 12 1e+02 1e-01 0.03583333 0.02759051
## 13 1e-02 1e+00 0.55115385 0.04366593
## 14 1e-01 1e+00 0.55115385 0.04366593
## 15 1e+00 1e+00 0.06384615 0.04375618
## 16 5e+00 1e+00 0.05884615 0.04020934
## 17 1e+01 1e+00 0.05884615 0.04020934
## 18 1e+02 1e+00 0.05884615 0.04020934
## 19 1e-02 5e+00 0.55115385 0.04366593
## 20 1e-01 5e+00 0.55115385 0.04366593
## 21 1e+00 5e+00 0.49493590 0.04724924
## 22 5e+00 5e+00 0.48217949 0.05470903
## 23 1e+01 5e+00 0.48217949 0.05470903
## 24 1e+02 5e+00 0.48217949 0.05470903
## 25 1e-02 1e+01 0.55115385 0.04366593
## 26 1e-01 1e+01 0.55115385 0.04366593
## 27 1e+00 1e+01 0.51794872 0.05063697
## 28 5e+00 1e+01 0.51794872 0.04917316
## 29 1e+01 1e+01 0.51794872 0.04917316
## 30 1e+02 1e+01 0.51794872 0.04917316
## 31 1e-02 1e+02 0.55115385 0.04366593
## 32 1e-01 1e+02 0.55115385 0.04366593
## 33 1e+00 1e+02 0.55115385 0.04366593
## 34 5e+00 1e+02 0.55115385 0.04366593
## 35 1e+01 1e+02 0.55115385 0.04366593
## 36 1e+02 1e+02 0.55115385 0.04366593
For a radial kernel, the lowest cross-validation error is obtained for a gamma of 0.01 and a cost of 100.
svm.linear <- svm(mpglevel ~ ., data = Auto, kernel = "linear", cost = 1)
svm.poly <- svm(mpglevel ~ ., data = Auto, kernel = "polynomial", cost = 100, degree = 2)
svm.radial <- svm(mpglevel ~ ., data = Auto, kernel = "radial", cost = 100, gamma = 0.01)
plotpairs = function(fit) {
for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
plot(fit, Auto, as.formula(paste("mpg~", name, sep = "")))
}
}
plotpairs(svm.linear)
plotpairs(svm.poly)
plotpairs(svm.radial)
This problem involves the OJ data set which is part of the ISLR package.
library(ISLR)
library(e1071)
train <- sample(nrow(OJ), 800)
OJ.train <- OJ[train, ]
OJ.test <- OJ[-train, ]
svm.linear <- svm(Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
summary(svm.linear)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 436
##
## ( 219 217 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
Support vector classifier creates 447 support vectors out of 800 training points. Out of these, 224 belong to level MM and remaining 223 belong to level CH.
train.pred <- predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 443 56
## MM 71 230
(78 + 55) / (439 + 228 + 78 + 55)
## [1] 0.16625
test.pred <- predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 126 28
## MM 27 89
(31 + 18) / (141 + 80 + 31 + 18)
## [1] 0.1814815
The training error rate is 16.6% and test error rate is about 18.1%.
set.seed(2)
tune.out <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "linear", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.01778279
##
## - best performance: 0.1625
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.16875 0.04135299
## 2 0.01778279 0.16250 0.03864008
## 3 0.03162278 0.16625 0.03866254
## 4 0.05623413 0.16750 0.03689324
## 5 0.10000000 0.16500 0.03425801
## 6 0.17782794 0.16625 0.03283481
## 7 0.31622777 0.16625 0.03438447
## 8 0.56234133 0.16750 0.03343734
## 9 1.00000000 0.17125 0.03488573
## 10 1.77827941 0.17250 0.03670453
## 11 3.16227766 0.17250 0.03622844
## 12 5.62341325 0.17250 0.03525699
## 13 10.00000000 0.16875 0.03963812
svm.linear <- svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = tune.out$best.parameter$cost)
train.pred <- predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 445 54
## MM 69 232
(71 + 56) / (438 + 235 + 71 + 56)
## [1] 0.15875
test.pred <- predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 126 28
## MM 27 89
(32 + 19) / (140 + 79 + 32 + 19)
## [1] 0.1888889
We may see that, with the best cost, the training error rate is now 15.8% and the test error rate is 18.8%
svm.radial <- svm(Purchase ~ ., kernel = "radial", data = OJ.train)
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 363
##
## ( 186 177 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred <- predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 456 43
## MM 71 230
(77 + 39) / (455 + 229 + 77 + 39)
## [1] 0.145
test.pred <- predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 136 18
## MM 32 84
(28 + 18) / (141 + 83 + 28 + 18)
## [1] 0.1703704
Radial kernel with default gamma creates 382 support vectors, out of which, 187 belong to level CH and remaining 195 belong to level MM. The classifier has a training error of 14.5% and a test error of 17% which is a slight improvement over linear kernel. We now use cross validation to find optimal cost.
set.seed(2)
tune.out <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "radial", ranges = list(cost = 10^seq(-2,
1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.3162278
##
## - best performance: 0.16125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.37625 0.05318012
## 2 0.01778279 0.37625 0.05318012
## 3 0.03162278 0.37625 0.05185785
## 4 0.05623413 0.20250 0.03809710
## 5 0.10000000 0.18000 0.04647281
## 6 0.17782794 0.16625 0.04041881
## 7 0.31622777 0.16125 0.03304563
## 8 0.56234133 0.16500 0.03476109
## 9 1.00000000 0.16375 0.03508422
## 10 1.77827941 0.16625 0.03821086
## 11 3.16227766 0.16625 0.03438447
## 12 5.62341325 0.17125 0.04084609
## 13 10.00000000 0.18125 0.04868051
svm.radial <- svm(Purchase ~ ., kernel = "radial", data = OJ.train, cost = tune.out$best.parameter$cost)
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial", cost = tune.out$best.parameter$cost)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.3162278
##
## Number of Support Vectors: 432
##
## ( 218 214 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred <- predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 455 44
## MM 76 225
(77 + 39) / (455 + 229 + 77 + 39)
## [1] 0.145
test.pred <- predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 133 21
## MM 33 83
(28 + 18) / (141 + 83 + 28 + 18)
## [1] 0.1703704
Tuning does not reduce train and test error rates as we already used the optimal cost of 1
svm.poly <- svm(Purchase ~ ., kernel = "polynomial", data = OJ.train, degree = 2)
summary(svm.poly)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial",
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 436
##
## ( 221 215 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred <- predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 462 37
## MM 103 198
(105 + 33) / (461 + 201 + 105 + 33)
## [1] 0.1725
test.pred <- predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 142 12
## MM 42 74
(41 + 10) / (149 + 70 + 41 + 10)
## [1] 0.1888889
Polynomial kernel with default gamma creates 452 support vectors, out of which, 220 belong to level CH and remaining 232 belong to level MM. The classifier has a training error of 17.2% and a test error of 18.8% which is no improvement over linear kernel. We now use cross validation to find optimal cost.
set.seed(2)
tune.out <- tune(svm, Purchase ~ ., data = OJ.train, kernel = "polynomial", degree = 2, ranges = list(cost = 10^seq(-2,
1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.17875
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.37625 0.05318012
## 2 0.01778279 0.36000 0.04743416
## 3 0.03162278 0.34625 0.04825065
## 4 0.05623413 0.32750 0.03425801
## 5 0.10000000 0.31125 0.03143004
## 6 0.17782794 0.23500 0.03574602
## 7 0.31622777 0.20000 0.03173239
## 8 0.56234133 0.19750 0.02622022
## 9 1.00000000 0.19375 0.03186887
## 10 1.77827941 0.18500 0.03106892
## 11 3.16227766 0.18500 0.03162278
## 12 5.62341325 0.18250 0.03545341
## 13 10.00000000 0.17875 0.03998698
svm.poly <- svm(Purchase ~ ., kernel = "polynomial", degree = 2, data = OJ.train, cost = tune.out$best.parameter$cost)
summary(svm.poly)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial",
## degree = 2, cost = tune.out$best.parameter$cost)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 10
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 338
##
## ( 173 165 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred <- predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 464 35
## MM 77 224
(72 + 44) / (450 + 234 + 72 + 44)
## [1] 0.145
test.pred <- predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 139 15
## MM 35 81
(31 + 19) / (140 + 80 + 31 + 19)
## [1] 0.1851852
Tuning reduce train and test error rates