library(e1071)
library(ISLR)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## v purrr   0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x purrr::lift()   masks caret::lift()

5. We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary.We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

  1. Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:
 x1 = runif(500)-0.5
 x2 = runif(500)-0.5
 y = 1*(x1^2-x2^2 > 0)
  1. Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.
plot(x1,x2,col=ifelse(y,'red','blue'))

  1. Fit a logistic regression model to the data, using X1 and X2 as predictors.
glm.fit=glm(y~. ,family='binomial', data=data.frame(x1,x2,y))
summary(glm.fit)
## 
## Call:
## glm(formula = y ~ ., family = "binomial", data = data.frame(x1, 
##     x2, y))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.29828  -1.16084  -0.00025   1.17506   1.30403  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.0001294  0.0896375   0.001    0.999
## x1           0.1760536  0.3142119   0.560    0.575
## x2          -0.4232675  0.3118969  -1.357    0.175
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 693.15  on 499  degrees of freedom
## Residual deviance: 691.01  on 497  degrees of freedom
## AIC: 697.01
## 
## Number of Fisher Scoring iterations: 3
  1. Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
glm.pred=predict(glm.fit,data.frame(x1,x2)) 
plot(x1,x2,col=ifelse(glm.pred>0,'red','blue'),pch=ifelse(as.integer(glm.pred>0) == y,1,4))

  1. Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).
glm.poly.fit=glm(y~poly(x1,2)+poly(x2,2) ,family='binomial', data=data.frame(x1,x2,y))
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
  1. Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
glm.poly.pred=predict(glm.poly.fit,data.frame(x1,x2))    
plot(x1,x2,col=ifelse(glm.poly.pred>0,'red','blue'),pch=ifelse(as.integer(glm.pred>0) == y,1,4))

  1. Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm.fit=svm(y~.,data=data.frame(x1,x2,y=as.factor(y)),kernel='linear')
svm.pred=predict(svm.fit,data.frame(x1,x2),type='response')
plot(x1,x2, col=ifelse(svm.pred!=0,'red','blue'),pch=ifelse(svm.pred==y,1,4))

  1. Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm.poly.fit=svm(y~.,data=data.frame(x1,x2,y=as.factor(y)),kernel='polynomial',degree=2)
svm.poly.pred=predict(svm.poly.fit,data.frame(x1,x2),type='response')
plot(x1,x2,col=ifelse(svm.poly.pred!=0,'red','blue'),pch=ifelse(svm.poly.pred == y,1,4))

  1. Comment on your results.

The model using support vector machine with a polynomial kernels appers to have the best prediction power. It captured the non-linear nature of the predictors and the variable response. The regression line using the polynomial predictors also had a better accuracy than the SVM using the linear kernel as well.

7. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

  1. Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
Auto <- Auto
Auto$mpg=ifelse(Auto$mpg>median(Auto$mpg),1,0)
table(Auto$mpg)
## 
##   0   1 
## 196 196
  1. Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.
costs=data.frame(cost=seq(0.05,100,length.out = 15))               
svm.tune=tune(svm,mpg~.,data=Auto,ranges=costs,kernel='linear')     
svm.tune
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##  0.05
## 
## - best performance: 0.1024354
  1. Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.
params = data.frame(cost=seq(0.05,100,length.out = 5),degree=seq(1,100,length.out = 5))
svm.poly = tune(svm,mpg~.,data=Auto,ranges=params,kernel='polynomial')
svm.poly
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##   100      1
## 
## - best performance: 0.09770715
summary(svm.poly)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##   100      1
## 
## - best performance: 0.09770715 
## 
## - Detailed performance results:
##        cost degree      error dispersion
## 1    0.0500   1.00 0.30626637 0.04705768
## 2   25.0375   1.00 0.10253181 0.04963406
## 3   50.0250   1.00 0.10061535 0.05216098
## 4   75.0125   1.00 0.09906474 0.05223427
## 5  100.0000   1.00 0.09770715 0.05127389
## 6    0.0500  25.75 0.52807507 0.05673551
## 7   25.0375  25.75 0.52807507 0.05673551
## 8   50.0250  25.75 0.52807507 0.05673551
## 9   75.0125  25.75 0.52807507 0.05673551
## 10 100.0000  25.75 0.52807507 0.05673551
## 11   0.0500  50.50 0.52807507 0.05673551
## 12  25.0375  50.50 0.52807507 0.05673551
## 13  50.0250  50.50 0.52807507 0.05673551
## 14  75.0125  50.50 0.52807507 0.05673551
## 15 100.0000  50.50 0.52807507 0.05673551
## 16   0.0500  75.25 0.52807507 0.05673551
## 17  25.0375  75.25 0.52807507 0.05673551
## 18  50.0250  75.25 0.52807507 0.05673551
## 19  75.0125  75.25 0.52807507 0.05673551
## 20 100.0000  75.25 0.52807507 0.05673551
## 21   0.0500 100.00 0.52807507 0.05673551
## 22  25.0375 100.00 0.52807507 0.05673551
## 23  50.0250 100.00 0.52807507 0.05673551
## 24  75.0125 100.00 0.52807507 0.05673551
## 25 100.0000 100.00 0.52807507 0.05673551
  1. Make some plots to back up your assertions in (b) and (c). Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing plot(svmfit, dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type plot(svmfit, dat, x1∼x4) in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.
plot(svm.tune$performance[,c(1,2)],type='l')

plot(svm.poly$performances)

8. This problem involves the OJ data set which is part of the ISLR package.

  1. Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
set.seed(1)
train=sample(1:1070,800)
test=(1:1070)[-train]
  1. Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
svm.fit.oj=svm(Purchase~.,data=OJ,subset=train,cost=0.01,kernel='linear')
summary(svm.fit.oj)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ, cost = 0.01, kernel = "linear", 
##     subset = train)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  435
## 
##  ( 219 216 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
  1. What are the training and test error rates?
svm.pred.oj.train=predict(svm.fit.oj,OJ[train,])
mean(OJ$Purchase[train]!= svm.pred.oj.train)
## [1] 0.175
svm.pred.oj.test=predict(svm.fit.oj,OJ[test,])
mean(OJ$Purchase[test]!= svm.pred.oj.test)
## [1] 0.1777778
  1. Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
svm.tune.oj=tune(svm,Purchase~.,data=OJ[train,],ranges=data.frame(cost=seq(0.01,10,25)),kernel='linear')
summary(svm.tune.oj)
## 
## Error estimation of 'svm' using 10-fold cross validation: 0.17375

0.1775

  1. Compute the training and test error rates using this new value for cost.
oj.svm.fit2=svm(Purchase~.,data=OJ,subset=train,cost=0.1775,kernel='linear')
oj.svm.pred.train2=predict(oj.svm.fit2,OJ[train,])
mean(OJ$Purchase[train]!= oj.svm.pred.train2)
## [1] 0.1675
oj.svm.pred.test2=predict(oj.svm.fit2,OJ[test,])
mean(OJ$Purchase[test]!= oj.svm.pred.test2)
## [1] 0.1666667
  1. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
svm.fit3=svm(Purchase~.,data=OJ,subset=train,cost=0.01,kernel='radial')
svm.train.pred3=predict(svm.fit3,OJ[train,])
mean(OJ$Purchase[train]!= svm.train.pred3)
## [1] 0.39375
svm.test.pred3=predict(svm.fit3,OJ[test,])
mean(OJ$Purchase[test]!= svm.test.pred3)
## [1] 0.3777778
  1. Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
svm.fit4=svm(Purchase~.,data=OJ,subset=train,cost=0.01,kernel='polynomial',degree=2)
svm.train.pred4=predict(svm.fit4,OJ[train,])
mean(OJ$Purchase[train]!= svm.train.pred4)
## [1] 0.3725
svm.test.pred4=predict(svm.fit4,OJ[test,])
mean(OJ$Purchase[test]!= svm.test.pred4)
## [1] 0.3666667
  1. Overall, which approach seems to give the best results on this data?

The linear model with the optimal cost parameter had the smallest error rate.