Chapter 9: 5, 7, 8
a) Generate a dataset with n=500 and p=2, such that the observations belong to two classes with a quadratic decision boudnary between them. For instance, you can do as this as follows:
set.seed(10)
x1 = runif(500)-0.5
x2=runif(500) - 0.5
y=1*(x1^2-x2^2 > 0)
b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis and X2 on the y-axis.
plot(x1, x2, xlab = "X1", ylab = "X2", col=(4-y), pch = (3-y))
c) Fit a logistic regression model on the data, using X1 and X1 as predictors.
logit.fit = glm(y~ x1 + x2, family = "binomial")
summary(logit.fit)
##
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.347 -1.152 1.024 1.166 1.297
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.02449 0.08988 0.272 0.7853
## x1 0.23573 0.31281 0.754 0.4511
## x2 -0.52372 0.30367 -1.725 0.0846 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 693.12 on 499 degrees of freedom
## Residual deviance: 689.57 on 497 degrees of freedom
## AIC: 695.57
##
## Number of Fisher Scoring iterations: 3
d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear
data = data.frame(x1 = x1, x2=x2, y=y)
probs = predict(logit.fit, data, type = "response")
preds = rep(0, 500)
preds[probs > 0.47] = 1
plot(data[preds == 1, ]$x1, data[preds ==1, ]$x2, col = (4-1), pch = (3-1), xlab = "X1", ylab = "X2")
points(data[preds == 0, ]$x1, data[preds == 0, ]$x2, col = (4-0), pch = (3-0))
e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors.
logitnl.fit = glm(y~ poly(x1,2), + poly(x2, 2) + I(x1*x2), family = "binomial")
summary(logitnl.fit)
##
## Call:
## glm(formula = y ~ poly(x1, 2), family = "binomial", data = +poly(x2,
## 2) + I(x1 * x2))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4422 -0.7503 0.2098 0.7913 1.8645
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1438 0.1113 1.292 0.197
## poly(x1, 2)1 2.9568 2.6618 1.111 0.267
## poly(x1, 2)2 33.1878 3.1146 10.656 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 693.12 on 499 degrees of freedom
## Residual deviance: 518.87 on 497 degrees of freedom
## AIC: 524.87
##
## Number of Fisher Scoring iterations: 4
f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
probs = predict(logitnl.fit, data, type = "response")
preds = rep(0, 500)
preds[probs > 0.47] = 1
plot(data[preds ==1, ]$x1, data[preds == 1, ]$x2, col = (4-1), pch = (3-1), xlab = "X1", ylab = "X2")
points(data[preds == 0, ]$x1, data[preds == 0, ]$x2, col = (4-0), pch = (3-0))
g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
library(e1071)
## Warning: package 'e1071' was built under R version 4.0.5
data$y = as.factor(data$y)
svm.fit = svm(y~ x1 + x2, data, kernal="linear", cost = 0.01)
preds = predict(svm.fit, data)
plot(data[preds == 1, ]$x1, data[preds == 1, ]$x2, col = (4 - 1), pch = (3 - 1), xlab = "X1", ylab = "X2")
points(data[preds == 0, ]$x1, data[preds == 0, ]$x2, col = (4 - 0), pch = (3 - 0))
h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
data$y = as.factor(data$y)
svmnl.fit = svm(y~ x1+ x2, data, kernel = "radial", gamma = 1)
preds = predict(svmnl.fit, data)
plot(data[preds ==1, ]$x1, data[preds ==1, ]$x2, col = (4 - 1), pch = (3 - 1), xlab = "X1", ylab = "X2")
points(data[preds ==0, ]$x1, data[preds ==0, ]$x2, col = (4 - 0), pch = (3 - 0), xlab = "X1", ylab = "X2")
i) Comment on your results
a) Create binary variable that takes on a 1 for cars with gas mileage above the median, and 0 for cars with gas mileage below the median.
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.5
var = ifelse(Auto$mpg > median(Auto$mpg), 1, 0)
Auto$mpglevel = as.factor(var)
b) Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.
set.seed(10)
tune.out = tune(svm, mpglevel~., data = Auto, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100, 1000)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.007628205
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-02 0.073974359 0.03067293
## 2 1e-01 0.050897436 0.03153169
## 3 1e+00 0.007628205 0.01228382
## 4 5e+00 0.015256410 0.01784872
## 5 1e+01 0.020384615 0.02019157
## 6 1e+02 0.033205128 0.01737168
## 7 1e+03 0.033205128 0.01737168
c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results
tune.out = tune(svm, mpglevel ~., data = Auto, kernel = "polynomial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100), degree = c(2, 3, 4)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 100 2
##
## - best performance: 0.2983974
##
## - Detailed performance results:
## cost degree error dispersion
## 1 1e-02 2 0.5560256 0.03078585
## 2 1e-01 2 0.5560256 0.03078585
## 3 1e+00 2 0.5560256 0.03078585
## 4 5e+00 2 0.5560256 0.03078585
## 5 1e+01 2 0.5205769 0.07526613
## 6 1e+02 2 0.2983974 0.04882958
## 7 1e-02 3 0.5560256 0.03078585
## 8 1e-01 3 0.5560256 0.03078585
## 9 1e+00 3 0.5560256 0.03078585
## 10 5e+00 3 0.5560256 0.03078585
## 11 1e+01 3 0.5560256 0.03078585
## 12 1e+02 3 0.3339744 0.07186352
## 13 1e-02 4 0.5560256 0.03078585
## 14 1e-01 4 0.5560256 0.03078585
## 15 1e+00 4 0.5560256 0.03078585
## 16 5e+00 4 0.5560256 0.03078585
## 17 1e+01 4 0.5560256 0.03078585
## 18 1e+02 4 0.5560256 0.03078585
tune.out = tune(svm, mpglevel ~., data = Auto, kernel = "radial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100), gamma = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 100 0.01
##
## - best performance: 0.01282051
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-02 1e-02 0.56128205 0.04043851
## 2 1e-01 1e-02 0.09192308 0.02504871
## 3 1e+00 1e-02 0.07397436 0.02546808
## 4 5e+00 1e-02 0.04583333 0.03565912
## 5 1e+01 1e-02 0.01788462 0.01727588
## 6 1e+02 1e-02 0.01282051 0.01351401
## 7 1e-02 1e-01 0.21955128 0.07414576
## 8 1e-01 1e-01 0.07653846 0.02704017
## 9 1e+00 1e-01 0.05858974 0.03175102
## 10 5e+00 1e-01 0.03051282 0.02878864
## 11 1e+01 1e-01 0.02801282 0.02509397
## 12 1e+02 1e-01 0.03314103 0.02067257
## 13 1e-02 1e+00 0.56128205 0.04043851
## 14 1e-01 1e+00 0.56128205 0.04043851
## 15 1e+00 1e+00 0.06628205 0.03439069
## 16 5e+00 1e+00 0.06378205 0.03019594
## 17 1e+01 1e+00 0.06378205 0.03019594
## 18 1e+02 1e+00 0.06378205 0.03019594
## 19 1e-02 5e+00 0.56128205 0.04043851
## 20 1e-01 5e+00 0.56128205 0.04043851
## 21 1e+00 5e+00 0.50782051 0.07016412
## 22 5e+00 5e+00 0.50525641 0.07145882
## 23 1e+01 5e+00 0.50525641 0.07145882
## 24 1e+02 5e+00 0.50525641 0.07145882
## 25 1e-02 1e+01 0.56128205 0.04043851
## 26 1e-01 1e+01 0.56128205 0.04043851
## 27 1e+00 1e+01 0.53326923 0.06812082
## 28 5e+00 1e+01 0.52301282 0.05688936
## 29 1e+01 1e+01 0.52301282 0.05688936
## 30 1e+02 1e+01 0.52301282 0.05688936
## 31 1e-02 1e+02 0.56128205 0.04043851
## 32 1e-01 1e+02 0.56128205 0.04043851
## 33 1e+00 1e+02 0.56128205 0.04043851
## 34 5e+00 1e+02 0.56128205 0.04043851
## 35 1e+01 1e+02 0.56128205 0.04043851
## 36 1e+02 1e+02 0.56128205 0.04043851
d) Make some plots to back up your assertions in (b) and (c). Hint: In the lab, we used the plot() function for svm objects only in cases with p=2. When p>2 you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, isntead of typing plot(svmfit, dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type plot(svmfit, dat, x1~x4) in order to plot just the first and fourth variables. However, you must replace X1 and X4 with the correct variable names. To find out more, type?plot.svm.
svm.linear = svm(mpglevel~., data = Auto, kernel = "linear", cost = 1)
svm.poly = svm(mpglevel~., data = Auto, kernel = "polynomial", cost = 100, degree = 2)
svm.radial = svm(mpglevel~., data = Auto, kernel = "radial", cost = 100, gamma = 0.01)
plotpairs = function(fit) {
for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
plot(fit, Auto, as.formula(paste("mpg~", name, sep = "")))
}
}
plotpairs(svm.linear)
plotpairs(svm.poly)
plotpairs(svm.radial)
a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
set.seed(10)
train = sample(nrow(OJ), 800)
OJ.train = OJ[train,]
OJ.test = OJ[-train,]
b) Fit a support vector classifier to the training data using cost-0.01, with Purchase as the response and other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
svm.linear = svm(Purchase~., data = OJ.train, kernel = "linear", cost = 0.01)
summary(svm.linear)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 451
##
## ( 226 225 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
c) What are the training and test error rates?
train.pred = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 431 60
## MM 84 225
(431+225)/(431+60+84+225)
## [1] 0.82
test.pred = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 143 19
## MM 23 85
(143+85)/(143+19+23+85)
## [1] 0.8444444
d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
tune.out = tune(svm, Purchase~., data = OJ.train, kernel = "linear", ranges = list(cost = 10^seq(-2,1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.18125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.18375 0.05402224
## 2 0.01778279 0.18250 0.05374838
## 3 0.03162278 0.18500 0.05230785
## 4 0.05623413 0.18250 0.05109903
## 5 0.10000000 0.18625 0.05350558
## 6 0.17782794 0.18500 0.05230785
## 7 0.31622777 0.18750 0.04823265
## 8 0.56234133 0.18125 0.04832256
## 9 1.00000000 0.18125 0.05043216
## 10 1.77827941 0.18250 0.05277047
## 11 3.16227766 0.18375 0.05239076
## 12 5.62341325 0.18250 0.05109903
## 13 10.00000000 0.18875 0.05015601
e) Compute the training and test error rates using this new value for cost.
svm.linear = svm(Purchase~., kernel = "linear", data = OJ.train, cost = tune.out$best.parameter$cost)
train.pred = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 431 60
## MM 77 232
(431+232)/(431+232+60+77)
## [1] 0.82875
test.pred = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 145 17
## MM 21 87
(145+87)/(145+87+17+21)
## [1] 0.8592593
f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
svm.radial = svm(Purchase~., kernel = "radial", data = OJ.train)
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 389
##
## ( 199 190 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred = predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 448 43
## MM 86 223
(448+223)/(448+43+86+223)
## [1] 0.83875
test.pred = predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 147 15
## MM 23 85
(147+85)/(147+15+23+85)
## [1] 0.8592593
tune.out = tune(svm, Purchase~., data = OJ.train, kernel = "radial", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.1778279
##
## - best performance: 0.185
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.38625 0.04308019
## 2 0.01778279 0.38625 0.04308019
## 3 0.03162278 0.38375 0.04372023
## 4 0.05623413 0.22250 0.03899786
## 5 0.10000000 0.19500 0.03593976
## 6 0.17782794 0.18500 0.03106892
## 7 0.31622777 0.18750 0.03486083
## 8 0.56234133 0.18500 0.03374743
## 9 1.00000000 0.18625 0.03304563
## 10 1.77827941 0.18500 0.03987829
## 11 3.16227766 0.18875 0.03701070
## 12 5.62341325 0.19000 0.03525699
## 13 10.00000000 0.20250 0.03106892
svm.radial = svm(Purchase~., data = OJ.train, cost = tune.out$best.parameter$cost)
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, cost = tune.out$best.parameter$cost)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.1778279
##
## Number of Support Vectors: 497
##
## ( 250 247 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred = predict(svm.radial, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 448 43
## MM 92 217
(448+217)/(448+43+92+217)
## [1] 0.83125
test.pred = predict(svm.radial, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 146 16
## MM 27 81
(146+81)/(146+16+27+81)
## [1] 0.8407407
g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
svm.poly = svm(Purchase~., kernel = "polynomial", data = OJ.train, degree = 2)
summary(svm.poly)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial",
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 461
##
## ( 234 227 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred = predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 455 36
## MM 119 190
(455+190)/(455+36+119+190)
## [1] 0.80625
test.pred = predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 149 13
## MM 34 74
(149+74)/(149+13+34+74)
## [1] 0.8259259
tune.out = tune(svm, Purchase~., data = OJ.train, kernel = "polynomial", degree =2, ranges = list(cost=10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.1875
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.38625 0.05382908
## 2 0.01778279 0.36625 0.04251225
## 3 0.03162278 0.36000 0.04362084
## 4 0.05623413 0.33500 0.05130248
## 5 0.10000000 0.31375 0.04767147
## 6 0.17782794 0.26875 0.04868051
## 7 0.31622777 0.21000 0.03476109
## 8 0.56234133 0.20875 0.03775377
## 9 1.00000000 0.20625 0.03019037
## 10 1.77827941 0.19500 0.03446012
## 11 3.16227766 0.19250 0.03238227
## 12 5.62341325 0.18875 0.03087272
## 13 10.00000000 0.18750 0.03385016
svm.poly = svm(Purchase~., kernel = "polynomial", degree = 2, data = OJ.train, cost = tune.out$best.parameter$cost)
summary(svm.poly)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial",
## degree = 2, cost = tune.out$best.parameter$cost)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 10
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 360
##
## ( 185 175 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred = predict(svm.poly, OJ.train)
table(OJ.train$Purchase, train.pred)
## train.pred
## CH MM
## CH 451 40
## MM 85 224
(451+224)/(451+40+85+224)
## [1] 0.84375
test.pred = predict(svm.poly, OJ.test)
table(OJ.test$Purchase, test.pred)
## test.pred
## CH MM
## CH 148 14
## MM 27 81
(148+81)/(148+14+27+81)
## [1] 0.8481481
h) Overall, which approach seems to give the best results on this data?