5. We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
library(caret)
## Warning: package 'caret' was built under R version 4.0.5
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.3
library(e1071)
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.5
**(a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:> x1=runif (500) -0.5 > x2=runif (500) -0.5 > y=1*( x12-x22 > 0)**
set.seed(10)
x1= runif(500, min = 0, max = 1) - 0.5
x2= runif(500, min = 0, max = 1) - 0.5
y = 1*( x1^2 - x2^2 > 0)
(b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y-axis.
plot(x1[y == 0], x2[y == 0], col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(x1[y == 1], x2[y == 1], col = "red", pch = 1)
(c) Fit a logistic regression model to the data, using X1 and X2 as predictors.
logit.fit <- glm(y ~ x1 + x2, family = "binomial")
summary(logit.fit)
##
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.347 -1.152 1.024 1.166 1.297
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.02449 0.08988 0.272 0.7853
## x1 0.23573 0.31281 0.754 0.4511
## x2 -0.52372 0.30367 -1.725 0.0846 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 693.12 on 499 degrees of freedom
## Residual deviance: 689.57 on 497 degrees of freedom
## AIC: 695.57
##
## Number of Fisher Scoring iterations: 3
(d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
data = data.frame(x1 = x1, x2 = x2, y = y)
prob = predict(logit.fit, data, type = "response")
pred = ifelse(prob > 0.52, 1, 0)
data.pos = data[pred == 1, ]
data.neg = data[pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)
(e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X21 , X1 ×X2, log(X2), and so forth).
nonlin.fit = glm(y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), data = data, family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(nonlin.fit)
##
## Call:
## glm(formula = y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), family = binomial,
## data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.640e-03 -2.000e-08 2.000e-08 2.000e-08 1.371e-03
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -119.0 25735.9 -0.005 0.996
## poly(x1, 2)1 -324.5 193257.8 -0.002 0.999
## poly(x1, 2)2 35118.5 990829.8 0.035 0.972
## poly(x2, 2)1 -3448.9 203826.6 -0.017 0.986
## poly(x2, 2)2 -37819.5 1016666.2 -0.037 0.970
## I(x1 * x2) -118.3 299756.2 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6.9312e+02 on 499 degrees of freedom
## Residual deviance: 5.9211e-06 on 494 degrees of freedom
## AIC: 12
##
## Number of Fisher Scoring iterations: 25
(f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
probs <- predict(nonlin.fit, data, type = "response")
preds <- ifelse(probs > 0.50, 1, 0)
data.pos = data[preds == 1, ]
data.neg = data[preds == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)
(g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm.fit = svm(as.factor(y) ~ x1 + x2, data, kernel = "linear", cost = 0.1)
svm.pred = predict(svm.fit, data)
data.pos = data[svm.pred == 1, ]
data.neg = data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)
(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm.fit = svm(as.factor(y) ~ x1 + x2, data, gamma = 1)
svm.pred = predict(svm.fit, data)
data.pos = data[svm.pred == 1, ]
data.neg = data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)
(i) Comment on your results.
7. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
library(ISLR)
attach(Auto)
## The following object is masked from package:ggplot2:
##
## mpg
summary(Auto)
## mpg cylinders displacement horsepower weight
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
##
## acceleration year origin name
## Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
## 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
## Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
## Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
## 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
## Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
## (Other) :365
(a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
above=ifelse(mpg>median(mpg), 1,0)
Auto=data.frame(Auto, above)
summary(Auto)
## mpg cylinders displacement horsepower weight
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
##
## acceleration year origin name
## Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
## 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
## Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
## Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
## 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
## Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
## (Other) :365
## above
## Min. :0.0
## 1st Qu.:0.0
## Median :0.5
## Mean :0.5
## 3rd Qu.:1.0
## Max. :1.0
##
(b) Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.
set.seed(10)
tune.out = tune(svm, above ~ ., data = Auto, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.06810309
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-02 0.08291148 0.02165372
## 2 1e-01 0.07654201 0.02370418
## 3 1e+00 0.06810309 0.01237864
## 4 5e+00 0.07582447 0.01623029
## 5 1e+01 0.08106759 0.01733018
## 6 1e+02 0.10509634 0.02755424
(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.
tune.out = tune(svm, above ~ ., data = Auto, kernel = "radial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 100
##
## - best performance: 0.0656574
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-02 0.39569623 0.02527089
## 2 1e-01 0.09818807 0.01472997
## 3 1e+00 0.08008306 0.01672601
## 4 5e+00 0.07620118 0.01853415
## 5 1e+01 0.07454127 0.01927523
## 6 1e+02 0.06565740 0.01700316
tune.out = tune(svm, above ~ ., data = Auto, kernel = "polynomial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 100
##
## - best performance: 0.2018691
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-02 0.5075035 0.03639453
## 2 1e-01 0.5067402 0.03637260
## 3 1e+00 0.4990909 0.03616796
## 4 5e+00 0.4662392 0.03615754
## 5 1e+01 0.4277071 0.03759331
## 6 1e+02 0.2018691 0.04332831
(d) Make some plots to back up your assertions in (b) and (c).Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2 . When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing > plot(svmfit , dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type > plot(svmfit , dat , x1 ∼x4) in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.
svm.polynomial = svm(above~., data=Auto, kernel="polynomial", cost=10, degree=2)
svm.linear = svm(above~., data=Auto, kernel="linear", cost=1)
svm.radial = svm(above~., data=Auto, kernel="radial", cost=10, gamma=0.01)
plotpairs = function(fit){
for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel","name"))]) {
plot(fit, Auto, as.formula(paste("mpg~", name, sep="")))
}
}
plotpairs(svm.polynomial)
8. This problem involves the OJ data set which is part of the ISLR package.
(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
train = sample(dim(OJ)[1], 800)
OJ.train = OJ[train, ]
OJ.test = OJ[-train, ]
(b) Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = 0.01)
summary(svm.linear)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 425
##
## ( 212 213 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
(c) What are the training and test error rates?
predTrain = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, predTrain)
## predTrain
## CH MM
## CH 422 56
## MM 77 245
predTest = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, predTest)
## predTest
## CH MM
## CH 149 26
## MM 27 68
(d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "linear", ranges = list(cost = 10^seq(-2,1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.16625
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.17500 0.04894725
## 2 0.01778279 0.17250 0.04556741
## 3 0.03162278 0.17250 0.04518481
## 4 0.05623413 0.17000 0.04257347
## 5 0.10000000 0.17000 0.04174992
## 6 0.17782794 0.16875 0.03963812
## 7 0.31622777 0.16750 0.03641962
## 8 0.56234133 0.16750 0.04048319
## 9 1.00000000 0.16625 0.03955042
## 10 1.77827941 0.16750 0.03496029
## 11 3.16227766 0.16875 0.03644345
## 12 5.62341325 0.16750 0.03917553
## 13 10.00000000 0.16875 0.04093101
(e) Compute the training and test error rates using this new value for cost.
svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = tune.out$best.parameters$cost)
predTrain = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, predTrain)
## predTrain
## CH MM
## CH 416 62
## MM 63 259
predTest = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, predTest)
## predTest
## CH MM
## CH 147 28
## MM 23 72
(f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
svm.radial = svm(Purchase ~ ., data = OJ.train, kernel = "radial")
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 360
##
## ( 176 184 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
predTrain = predict(svm.radial, OJ.train)
table(OJ.train$Purchase, predTrain)
## predTrain
## CH MM
## CH 435 43
## MM 70 252
predTest = predict(svm.radial, OJ.test)
table(OJ.test$Purchase, predTest)
## predTest
## CH MM
## CH 150 25
## MM 28 67
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "radial", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.16375
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.40250 0.05945353
## 2 0.01778279 0.40250 0.05945353
## 3 0.03162278 0.26375 0.08749008
## 4 0.05623413 0.19375 0.04093101
## 5 0.10000000 0.18500 0.03162278
## 6 0.17782794 0.17000 0.03736085
## 7 0.31622777 0.17125 0.03537988
## 8 0.56234133 0.16500 0.03717451
## 9 1.00000000 0.16375 0.03557562
## 10 1.77827941 0.16625 0.03537988
## 11 3.16227766 0.17125 0.03634805
## 12 5.62341325 0.17250 0.03322900
## 13 10.00000000 0.17750 0.02934469
(g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
svm.polynomial = svm(Purchase ~ ., data = OJ.train, kernel = "polynomial")
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 360
##
## ( 176 184 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
predTrain = predict(svm.polynomial, OJ.train)
table(OJ.train$Purchase, predTrain)
## predTrain
## CH MM
## CH 446 32
## MM 88 234
predTest = predict(svm.polynomial, OJ.test)
table(OJ.test$Purchase, predTest)
## predTest
## CH MM
## CH 154 21
## MM 35 60
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "polynomial", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 3.162278
##
## - best performance: 0.185
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.38000 0.03496029
## 2 0.01778279 0.37750 0.03216710
## 3 0.03162278 0.34750 0.04669642
## 4 0.05623413 0.32750 0.04479893
## 5 0.10000000 0.28875 0.04226652
## 6 0.17782794 0.22500 0.04124790
## 7 0.31622777 0.20875 0.04168749
## 8 0.56234133 0.19625 0.03387579
## 9 1.00000000 0.19250 0.03395258
## 10 1.77827941 0.18625 0.03701070
## 11 3.16227766 0.18500 0.04322101
## 12 5.62341325 0.19000 0.04401704
## 13 10.00000000 0.19000 0.04281744
(h) Overall, which approach seems to give the best results on this data? None of the SMV seem very good with the test data.