5.We have seen that we can fit an SVM with anon-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features. (a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:
x1 <- runif (500) -0.5
x2 <- runif (500) - 0.5
y <- 1 * (x1^2 - x2^2 > 0)
y <- as.factor(y)
dataset <- data.frame(y, x1, x2)
ggplot(dataset,aes(x1, x2, color=y))+geom_point()
lm.model <- glm(y~., data = dataset, family="binomial")
summary(lm.model)
##
## Call:
## glm(formula = y ~ ., family = "binomial", data = dataset)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.176 -1.142 -1.112 1.215 1.250
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.08867 0.08956 -0.990 0.322
## x1 -0.08924 0.32582 -0.274 0.784
## x2 -0.09819 0.30629 -0.321 0.749
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.18 on 499 degrees of freedom
## Residual deviance: 692.00 on 497 degrees of freedom
## AIC: 698
##
## Number of Fisher Scoring iterations: 3
lm.prob <-predict(lm.model, type="response")
lm.pred <- rep("No", 500)
lm.pred[lm.prob>0.5]="Yes"
ggplot(dataset, aes(x1, x2, color=lm.pred)) + geom_point()
# USING Quadratic transformation
x1T <- x1^2
y1 <- 1 * (x1T - x2^2 > 0)
y1 <- as.factor(y1)
dataset2 <- data.frame(y1, x1T, x2)
new.model <- glm(y1~., data=dataset2, family="binomial")
summary(new.model)
##
## Call:
## glm(formula = y1 ~ ., family = "binomial", data = dataset2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5467 -0.7052 -0.5374 0.6234 1.9369
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.7132 0.1695 -10.109 <2e-16 ***
## x1T 23.6092 2.1565 10.948 <2e-16 ***
## x2 -0.5263 0.3817 -1.379 0.168
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.18 on 499 degrees of freedom
## Residual deviance: 481.58 on 497 degrees of freedom
## AIC: 487.58
##
## Number of Fisher Scoring iterations: 5
lm.prob <-predict(new.model, type="response")
lm.pred <- rep("No", 500)
lm.pred[lm.prob>0.5]="Yes"
ggplot(dataset2, aes(x1, x2, color=lm.pred)) + geom_point()
set.seed(1)
svmfit <- svm(y~x1+x2, data=dataset, kernel="linear", cost=5, scale=F)
plot(svmfit, dataset)
summary(svmfit)
##
## Call:
## svm(formula = y ~ x1 + x2, data = dataset, kernel = "linear", cost = 5,
## scale = F)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 5
##
## Number of Support Vectors: 480
##
## ( 239 241 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
svm.predict <- predict(svmfit, dataset)
ggplot(dataset, aes(x1, x2, color=svm.predict)) + geom_point()
set.seed(1)
svmfit1 <- svm(y~., data=dataset, kernel="radial", cost=1000, gamma=0.5)
summary(svmfit1)
##
## Call:
## svm(formula = y ~ ., data = dataset, kernel = "radial", cost = 1000,
## gamma = 0.5)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1000
##
## Number of Support Vectors: 23
##
## ( 13 10 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
svm.predict <- predict(svmfit1, dataset)
ggplot(dataset, aes(x1, x2, color=svm.predict)) + geom_point()
7.In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set. (a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
Auto<-Auto|>mutate(mpg01=case_when(mpg>median(mpg)~1, mpg<median(mpg)~0))
Auto$mpg01 <- as.factor(Auto$mpg01)
Auto_new <- Auto|>select(!c(mpg,name))
Auto_new <- Auto_new|>relocate(mpg01, .before = 1)
set.seed(2)
tuned <- tune(svm, mpg01~., data=Auto_new, kernel="linear", ranges = list (cost = c ( 0.1, 1, 5, 10, 100)))
summary(tuned)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.08929487
##
## - Detailed performance results:
## cost error dispersion
## 1 0.1 0.09679487 0.06737312
## 2 1.0 0.09948718 0.05438007
## 3 5.0 0.08929487 0.03849957
## 4 10.0 0.08929487 0.03849957
## 5 100.0 0.09185897 0.04029356
bestmodL <- tuned$best.model
summary (bestmodL)
##
## Call:
## best.tune(METHOD = svm, train.x = mpg01 ~ ., data = Auto_new, ranges = list(cost = c(0.1,
## 1, 5, 10, 100)), kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 5
##
## Number of Support Vectors: 83
##
## ( 41 42 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
ypred <- predict(bestmodL, Auto_new)
table(predict=ypred, truth=Auto_new$mpg01)
## truth
## predict 0 1
## 0 174 11
## 1 22 185
After performing cross validation, we see that the best choice of cost is 5 The test error rate is: 0.08
Ans:
#Polynomial
set.seed (1)
tune.out <- tune (svm , mpg01 ∼ ., data = Auto_new,
kernel = "polynomial",
ranges = list (
cost = c( 0.1, 1, 5, 10, 100),
degree=c(1,2,3,4)
))
bestmodP <- tune.out$best.model
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 100 3
##
## - best performance: 0.08423077
##
## - Detailed performance results:
## cost degree error dispersion
## 1 0.1 1 0.08673077 0.04551036
## 2 1.0 1 0.08929487 0.04229479
## 3 5.0 1 0.08429487 0.03229996
## 4 10.0 1 0.08435897 0.03662670
## 5 100.0 1 0.08948718 0.03898410
## 6 0.1 2 0.27846154 0.09486227
## 7 1.0 2 0.25307692 0.13751948
## 8 5.0 2 0.17621795 0.04945319
## 9 10.0 2 0.18647436 0.05598001
## 10 100.0 2 0.18128205 0.06251437
## 11 0.1 3 0.20192308 0.11347783
## 12 1.0 3 0.09448718 0.04180527
## 13 5.0 3 0.08429487 0.04016554
## 14 10.0 3 0.08435897 0.04544023
## 15 100.0 3 0.08423077 0.03636273
## 16 0.1 4 0.26564103 0.09977887
## 17 1.0 4 0.21205128 0.09560470
## 18 5.0 4 0.18371795 0.06175709
## 19 10.0 4 0.16589744 0.06962914
## 20 100.0 4 0.12756410 0.05208506
ypred <- predict(bestmodP, Auto_new)
table(predict=ypred, truth=Auto_new$mpg01)
## truth
## predict 0 1
## 0 190 8
## 1 6 188
After performing cross validation, we see that the best choice of cost is 100, and degree=3 for the polynomial kernel. The test error rate is 0.04
#Radial
set.seed (1)
tune.out <- tune (svm , mpg01 ∼ ., data = Auto_new,
kernel = "radial",
ranges = list (
cost = c( 0.1, 1, 5, 10, 100),
gamma=c(0.5, 1, 2, 3, 4)
))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 1 1
##
## - best performance: 0.06634615
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 0.1 0.5 0.08666667 0.04687413
## 2 1.0 0.5 0.06884615 0.02963114
## 3 5.0 0.5 0.07903846 0.03051601
## 4 10.0 0.5 0.08923077 0.02732003
## 5 100.0 0.5 0.10698718 0.03512661
## 6 0.1 1.0 0.08673077 0.04535158
## 7 1.0 1.0 0.06634615 0.03244101
## 8 5.0 1.0 0.08916667 0.02708952
## 9 10.0 1.0 0.08923077 0.02732003
## 10 100.0 1.0 0.10448718 0.04560852
## 11 0.1 2.0 0.14282051 0.07578262
## 12 1.0 2.0 0.08673077 0.04371113
## 13 5.0 2.0 0.09942308 0.04881948
## 14 10.0 2.0 0.09429487 0.05387705
## 15 100.0 2.0 0.09435897 0.05261602
## 16 0.1 3.0 0.31878205 0.13973969
## 17 1.0 3.0 0.08416667 0.04171436
## 18 5.0 3.0 0.09179487 0.05416218
## 19 10.0 3.0 0.09179487 0.05416218
## 20 100.0 3.0 0.09692308 0.05501861
## 21 0.1 4.0 0.51788462 0.06842176
## 22 1.0 4.0 0.08923077 0.03843042
## 23 5.0 4.0 0.09173077 0.05268076
## 24 10.0 4.0 0.09179487 0.05139393
## 25 100.0 4.0 0.09435897 0.05398656
bestmodelR <-tune.out$best.model
ypred <- predict(bestmodelR, Auto_new)
table(predict=ypred, truth=Auto_new$mpg01)
## truth
## predict 0 1
## 0 188 6
## 1 8 190
After performing cross validation, we see that the best choice of cost is 1, and gamma=1 for the radial kernel. The test error rate is 0.04
#LINEAR KERNEL
plot(bestmodL , Auto_new , acceleration~weight)
plot(bestmodL , Auto_new , acceleration~horsepower)
#POLYNOMIAL KERNEL
plot(bestmodP , Auto_new , acceleration~weight)
plot(bestmodP , Auto_new , acceleration~horsepower)
#RADIAL KERNEL
plot(bestmodelR , Auto_new , acceleration~weight)
plot(bestmodelR , Auto_new , acceleration~horsepower)
8.This problem involves the OJ data set which is part of the ISLR2
package. (a) Create a training set containing a random sample of 800
observations, and a test set containing the remaining observations.
set.seed(2)
trainOJ <- sample(1:nrow(OJ), 800)
OJ.test <- OJ[-trainOJ, ]
OJtrain <- OJ[trainOJ,]
svmfit <- svm (Purchase ∼ ., data=OJ, subset=trainOJ, kernel = "linear",cost = 0.01, scale = FALSE)
summary(svmfit)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ, kernel = "linear", cost = 0.01,
## subset = trainOJ, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 614
##
## ( 306 308 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
trainpred <- predict(svmfit, OJtrain)
testpred <- predict(svmfit, OJ.test)
table(predict=trainpred, truth=OJtrain$Purchase)
## truth
## predict CH MM
## CH 449 145
## MM 41 165
table(predict=testpred, truth=OJ.test$Purchase)
## truth
## predict CH MM
## CH 146 59
## MM 17 48
(146+48/270)
## [1] 146.1778
Train error rate: 0.234 Test error rate: 0.281
set.seed (1)
tune.out <-tune(svm , Purchase ∼ ., data =OJtrain, kernel = "linear",ranges = list (cost = c(0.01, 0.1,1, 5, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.1675
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.17500 0.04677072
## 2 0.10 0.17375 0.05050096
## 3 1.00 0.17000 0.04721405
## 4 5.00 0.16750 0.03961621
## 5 10.00 0.16750 0.04297932
optimal cost is 5.
best.mod <- tune.out$best.model
summary(best.mod)
##
## Call:
## best.tune(METHOD = svm, train.x = Purchase ~ ., data = OJtrain, ranges = list(cost = c(0.01,
## 0.1, 1, 5, 10)), kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 5
##
## Number of Support Vectors: 316
##
## ( 157 159 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
trainpred <- predict(best.mod, OJtrain)
testpred <- predict(best.mod, OJ.test)
table(predict=trainpred, truth=OJtrain$Purchase)
## truth
## predict CH MM
## CH 433 68
## MM 57 242
table(predict=testpred, truth=OJ.test$Purchase)
## truth
## predict CH MM
## CH 144 30
## MM 19 77
Train error rate: 0.156 Test error rate: 0.181
set.seed(2)
svmfitR <- svm (Purchase ∼ ., data=OJtrain, kernel = "radial",cost = 0.01, gamma=1)
tune.out <-tune(svm , Purchase ∼ ., data =OJtrain, kernel = "radial",ranges = list (cost = c(0.01, 0.1,1, 5, 10)), gamma=1)
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.20125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.38750 0.08799463
## 2 0.10 0.33500 0.07835106
## 3 1.00 0.20125 0.04767147
## 4 5.00 0.21000 0.04816061
## 5 10.00 0.21500 0.04706674
bestmod <- tune.out$best.model
best.mod <- tune.out$best.model
summary(best.mod)
##
## Call:
## best.tune(METHOD = svm, train.x = Purchase ~ ., data = OJtrain, ranges = list(cost = c(0.01,
## 0.1, 1, 5, 10)), kernel = "radial", gamma = 1)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 468
##
## ( 215 253 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
trainpred <- predict(best.mod, OJtrain)
testpred <- predict(best.mod, OJ.test)
table(predict=trainpred, truth=OJtrain$Purchase)
## truth
## predict CH MM
## CH 456 52
## MM 34 258
table(predict=testpred, truth=OJ.test$Purchase)
## truth
## predict CH MM
## CH 139 33
## MM 24 74
A. For the tuned model: Train error rate: 0.1075 Test error rate: 0.21
set.seed(1)
svmfitP <- svm (Purchase ∼ ., data=OJtrain, kernel = "polynomial",cost = 0.01, degree=2)
summary(svmfit)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ, kernel = "linear", cost = 0.01,
## subset = trainOJ, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 614
##
## ( 306 308 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
trainpred <- predict(svmfitP, OJtrain)
testpred <- predict(svmfitP, OJ.test)
table(predict=trainpred, truth=OJtrain$Purchase)
## truth
## predict CH MM
## CH 489 292
## MM 1 18
table(predict=testpred, truth=OJ.test$Purchase)
## truth
## predict CH MM
## CH 162 100
## MM 1 7
tune.out <-tune(svm , Purchase ∼ ., data =OJtrain, kernel = "polynomial",ranges = list (cost = c(0.01, 0.1,1, 5, 10)), degree=2)
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.17
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.38250 0.06157651
## 2 0.10 0.31000 0.03944053
## 3 1.00 0.19375 0.02144923
## 4 5.00 0.18000 0.03016160
## 5 10.00 0.17000 0.02958040
bestmod <- tune.out$best.model
best.mod <- tune.out$best.model
summary(best.mod)
##
## Call:
## best.tune(METHOD = svm, train.x = Purchase ~ ., data = OJtrain, ranges = list(cost = c(0.01,
## 0.1, 1, 5, 10)), kernel = "polynomial", degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 10
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 329
##
## ( 163 166 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
trainpred <- predict(best.mod, OJtrain)
testpred <- predict(best.mod, OJ.test)
table(predict=trainpred, truth=OJtrain$Purchase)
## truth
## predict CH MM
## CH 452 66
## MM 38 244
table(predict=testpred, truth=OJ.test$Purchase)
## truth
## predict CH MM
## CH 141 32
## MM 22 75
Training error rate: 0.13 Test error rate: 0.2
Ans: The linear kernel approach seems to give the best results as it had the lowest test error rate.