5. We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

library(caret)
## Warning: package 'caret' was built under R version 4.0.5
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.3
library(e1071)
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.5

**(a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:> x1=runif (500) -0.5 > x2=runif (500) -0.5 > y=1*( x12-x22 > 0)**

set.seed(10)
x1= runif(500, min = 0, max = 1) - 0.5
x2= runif(500, min = 0, max = 1) - 0.5
y = 1*( x1^2 - x2^2 > 0)

(b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y-axis.

plot(x1[y == 0], x2[y == 0], col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(x1[y == 1], x2[y == 1], col = "red", pch = 1)

(c) Fit a logistic regression model to the data, using X1 and X2 as predictors.

logit.fit <- glm(y ~ x1 + x2, family = "binomial")
summary(logit.fit)
## 
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial")
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.347  -1.152   1.024   1.166   1.297  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  0.02449    0.08988   0.272   0.7853  
## x1           0.23573    0.31281   0.754   0.4511  
## x2          -0.52372    0.30367  -1.725   0.0846 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 693.12  on 499  degrees of freedom
## Residual deviance: 689.57  on 497  degrees of freedom
## AIC: 695.57
## 
## Number of Fisher Scoring iterations: 3

(d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

data = data.frame(x1 = x1, x2 = x2, y = y)
prob = predict(logit.fit, data, type = "response")
pred = ifelse(prob > 0.52, 1, 0)
data.pos = data[pred == 1, ]
data.neg = data[pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)

(e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X21 , X1 ×X2, log(X2), and so forth).

nonlin.fit = glm(y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), data = data, family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(nonlin.fit)
## 
## Call:
## glm(formula = y ~ poly(x1, 2) + poly(x2, 2) + I(x1 * x2), family = binomial, 
##     data = data)
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -1.640e-03  -2.000e-08   2.000e-08   2.000e-08   1.371e-03  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)     -119.0    25735.9  -0.005    0.996
## poly(x1, 2)1    -324.5   193257.8  -0.002    0.999
## poly(x1, 2)2   35118.5   990829.8   0.035    0.972
## poly(x2, 2)1   -3448.9   203826.6  -0.017    0.986
## poly(x2, 2)2  -37819.5  1016666.2  -0.037    0.970
## I(x1 * x2)      -118.3   299756.2   0.000    1.000
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6.9312e+02  on 499  degrees of freedom
## Residual deviance: 5.9211e-06  on 494  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 25

(f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

probs <- predict(nonlin.fit, data, type = "response")
preds <- ifelse(probs > 0.50, 1, 0)
data.pos = data[preds == 1, ]
data.neg = data[preds == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)

(g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm.fit = svm(as.factor(y) ~ x1 + x2, data, kernel = "linear", cost = 0.1)
svm.pred = predict(svm.fit, data)
data.pos = data[svm.pred == 1, ]
data.neg = data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)

(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm.fit = svm(as.factor(y) ~ x1 + x2, data, gamma = 1)
svm.pred = predict(svm.fit, data)
data.pos = data[svm.pred == 1, ]
data.neg = data[svm.pred == 0, ]
plot(data.pos$x1, data.pos$x2, col = "black", xlab = "X1", ylab = "X2", pch = 2)
points(data.neg$x1, data.neg$x2, col = "red", pch = 1)

(i) Comment on your results.

7. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

library(ISLR)
attach(Auto)
## The following object is masked from package:ggplot2:
## 
##     mpg
summary(Auto)
##       mpg          cylinders      displacement     horsepower        weight    
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
##                                                                                
##   acceleration        year           origin                      name    
##  Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
##  1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
##  Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
##  Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
##                                                  (Other)           :365

(a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

above=ifelse(mpg>median(mpg), 1,0)
Auto=data.frame(Auto, above)
summary(Auto)
##       mpg          cylinders      displacement     horsepower        weight    
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
##                                                                                
##   acceleration        year           origin                      name    
##  Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
##  1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
##  Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
##  Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
##                                                  (Other)           :365  
##      above    
##  Min.   :0.0  
##  1st Qu.:0.0  
##  Median :0.5  
##  Mean   :0.5  
##  3rd Qu.:1.0  
##  Max.   :1.0  
## 

(b) Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

set.seed(10)
tune.out = tune(svm, above ~ ., data = Auto, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.06810309 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.08291148 0.02165372
## 2 1e-01 0.07654201 0.02370418
## 3 1e+00 0.06810309 0.01237864
## 4 5e+00 0.07582447 0.01623029
## 5 1e+01 0.08106759 0.01733018
## 6 1e+02 0.10509634 0.02755424

(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

tune.out = tune(svm, above ~ ., data = Auto, kernel = "radial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   100
## 
## - best performance: 0.0656574 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.39569623 0.02527089
## 2 1e-01 0.09818807 0.01472997
## 3 1e+00 0.08008306 0.01672601
## 4 5e+00 0.07620118 0.01853415
## 5 1e+01 0.07454127 0.01927523
## 6 1e+02 0.06565740 0.01700316
tune.out = tune(svm, above ~ ., data = Auto, kernel = "polynomial", ranges = list(cost = c(0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   100
## 
## - best performance: 0.2018691 
## 
## - Detailed performance results:
##    cost     error dispersion
## 1 1e-02 0.5075035 0.03639453
## 2 1e-01 0.5067402 0.03637260
## 3 1e+00 0.4990909 0.03616796
## 4 5e+00 0.4662392 0.03615754
## 5 1e+01 0.4277071 0.03759331
## 6 1e+02 0.2018691 0.04332831

(d) Make some plots to back up your assertions in (b) and (c).Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2 . When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing > plot(svmfit , dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type > plot(svmfit , dat , x1 ∼x4) in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.

svm.polynomial = svm(above~., data=Auto, kernel="polynomial", cost=10, degree=2)
svm.linear = svm(above~., data=Auto, kernel="linear", cost=1)
svm.radial = svm(above~., data=Auto, kernel="radial", cost=10, gamma=0.01)

plotpairs = function(fit){
  for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel","name"))]) {
    plot(fit, Auto, as.formula(paste("mpg~", name, sep="")))
  }
}
plotpairs(svm.polynomial)

8. This problem involves the OJ data set which is part of the ISLR package.

(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

train = sample(dim(OJ)[1], 800)
OJ.train = OJ[train, ]
OJ.test = OJ[-train, ]

(b) Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = 0.01)
summary(svm.linear)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  425
## 
##  ( 212 213 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

(c) What are the training and test error rates?

predTrain = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, predTrain)
##     predTrain
##       CH  MM
##   CH 422  56
##   MM  77 245
predTest = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, predTest)
##     predTest
##       CH  MM
##   CH 149  26
##   MM  27  68

(d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "linear", ranges = list(cost = 10^seq(-2,1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.16625 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.17500 0.04894725
## 2   0.01778279 0.17250 0.04556741
## 3   0.03162278 0.17250 0.04518481
## 4   0.05623413 0.17000 0.04257347
## 5   0.10000000 0.17000 0.04174992
## 6   0.17782794 0.16875 0.03963812
## 7   0.31622777 0.16750 0.03641962
## 8   0.56234133 0.16750 0.04048319
## 9   1.00000000 0.16625 0.03955042
## 10  1.77827941 0.16750 0.03496029
## 11  3.16227766 0.16875 0.03644345
## 12  5.62341325 0.16750 0.03917553
## 13 10.00000000 0.16875 0.04093101

(e) Compute the training and test error rates using this new value for cost.

svm.linear = svm(Purchase ~ ., kernel = "linear", data = OJ.train, cost = tune.out$best.parameters$cost)
predTrain = predict(svm.linear, OJ.train)
table(OJ.train$Purchase, predTrain)
##     predTrain
##       CH  MM
##   CH 416  62
##   MM  63 259
predTest = predict(svm.linear, OJ.test)
table(OJ.test$Purchase, predTest)
##     predTest
##       CH  MM
##   CH 147  28
##   MM  23  72

(f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

svm.radial = svm(Purchase ~ ., data = OJ.train, kernel = "radial")
summary(svm.radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  360
## 
##  ( 176 184 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
predTrain = predict(svm.radial, OJ.train)
table(OJ.train$Purchase, predTrain)
##     predTrain
##       CH  MM
##   CH 435  43
##   MM  70 252
predTest = predict(svm.radial, OJ.test)
table(OJ.test$Purchase, predTest)
##     predTest
##       CH  MM
##   CH 150  25
##   MM  28  67
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "radial", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.16375 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.40250 0.05945353
## 2   0.01778279 0.40250 0.05945353
## 3   0.03162278 0.26375 0.08749008
## 4   0.05623413 0.19375 0.04093101
## 5   0.10000000 0.18500 0.03162278
## 6   0.17782794 0.17000 0.03736085
## 7   0.31622777 0.17125 0.03537988
## 8   0.56234133 0.16500 0.03717451
## 9   1.00000000 0.16375 0.03557562
## 10  1.77827941 0.16625 0.03537988
## 11  3.16227766 0.17125 0.03634805
## 12  5.62341325 0.17250 0.03322900
## 13 10.00000000 0.17750 0.02934469

(g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.

svm.polynomial = svm(Purchase ~ ., data = OJ.train, kernel = "polynomial")
summary(svm.radial)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  360
## 
##  ( 176 184 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
predTrain = predict(svm.polynomial, OJ.train)
table(OJ.train$Purchase, predTrain)
##     predTrain
##       CH  MM
##   CH 446  32
##   MM  88 234
predTest = predict(svm.polynomial, OJ.test)
table(OJ.test$Purchase, predTest)
##     predTest
##       CH  MM
##   CH 154  21
##   MM  35  60
tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "polynomial", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##      cost
##  3.162278
## 
## - best performance: 0.185 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.38000 0.03496029
## 2   0.01778279 0.37750 0.03216710
## 3   0.03162278 0.34750 0.04669642
## 4   0.05623413 0.32750 0.04479893
## 5   0.10000000 0.28875 0.04226652
## 6   0.17782794 0.22500 0.04124790
## 7   0.31622777 0.20875 0.04168749
## 8   0.56234133 0.19625 0.03387579
## 9   1.00000000 0.19250 0.03395258
## 10  1.77827941 0.18625 0.03701070
## 11  3.16227766 0.18500 0.04322101
## 12  5.62341325 0.19000 0.04401704
## 13 10.00000000 0.19000 0.04281744

(h) Overall, which approach seems to give the best results on this data? None of the SMV seem very good with the test data.