Author: Jay Liao (ID: RE6094028)

Load the required packages

library(ISLR)
library(dplyr)
library(e1071)
library(tidyverse)
library(ggplot2)
library(ggthemes)
library(caret)

Exercise 9.5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

Exercise 9.5 - (a)

Generate a data set with \(n = 500\) and \(p = 2\), such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:

set.seed(4028)
x1 = runif(500) - 0.5
x2 = runif(500) - 0.5
y = 1*(x1^2-x2^2 > 0)

Exercise 9.5 - (b)

Plot the observations, colored according to their class labels. Your plot should display \(X_1\) on the x-axis, and \(X_2\) on the y-axis.

par(pty = 's')
plot(x1[y == 0], x2[y == 0], col = 'darkorange2', pch = 19,
     xlab = 'X1', ylab = 'X2')
points(x1[y == 1], x2[y == 1], col = 'cornflowerblue', pch = 19)

兩個分類的邊界線看起來是\(X_1^2-X_2^2=0\),也就是\(X = \pm y\),是直線而非二次曲線。

Exercise 9.5 - (c)

Fit a logistic regression model to the data, using \(X_1\) and \(X_2\) as predictors.

dta <- data.frame(x1 = x1, x2 = x2, y = factor(y))
LR_fit <- glm(y ~ ., data = dta, family = binomial('logit'))
summary(LR_fit)

Call:
glm(formula = y ~ ., family = binomial("logit"), data = dta)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.234  -1.158  -1.113   1.184   1.237  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.01471    0.08954  -0.164    0.869
x1           0.06287    0.31440   0.200    0.841
x2           0.25593    0.31243   0.819    0.413

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 693.12  on 499  degrees of freedom
Residual deviance: 692.43  on 497  degrees of freedom
AIC: 698.43

Number of Fisher Scoring iterations: 3

Exercise 9.5 - (d)

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

LR_prob <- predict(LR_fit, newdata = dta, type = 'response')
LR_pred <- ifelse(LR_prob > 0.5, 1, 0)
par(pty = 's')
plot(dta$x1, dta$x2, col = LR_pred + 8, pch = 19)

分類的邊界是線性的,分得不太好。

Exercise 9.5 - (e)

Now fit a logistic regression model to the data using non-linear functions of \(X_1\) and \(X_2\) as predictors (e.g. \(X_1^2\), \(X_1 \times X_2\), \(log(X_2)\), and so forth).

NLR_fit <- glm(y ~ poly(x1, 2) + poly(x2, 2),
               data = dta, family = binomial('logit'))
summary(NLR_fit)

Call:
glm(formula = y ~ poly(x1, 2) + poly(x2, 2), family = binomial("logit"), 
    data = dta)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-2.402e-03  -2.000e-08  -2.000e-08   2.000e-08   2.153e-03  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)
(Intercept)      -71.05    1334.45  -0.053    0.958
poly(x1, 2)1    1498.46   28337.07   0.053    0.958
poly(x1, 2)2   64135.00 1141702.27   0.056    0.955
poly(x2, 2)1   -2740.83   50880.27  -0.054    0.957
poly(x2, 2)2  -60159.05 1071217.31  -0.056    0.955

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6.9312e+02  on 499  degrees of freedom
Residual deviance: 1.3207e-05  on 495  degrees of freedom
AIC: 10

Number of Fisher Scoring iterations: 25

看起來每個變項的效果都未達統計顯著。

Exercise 9.5 - (f)

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

NLR_prob <- predict(NLR_fit, newdata = dta, type = 'response')
NLR_pred <- ifelse(NLR_prob > 0.5, 1, 0)
par(pty = 's')
plot(dta$x1, dta$x2, col = NLR_pred + 28, pch = 19)

看起來分類表現比線性的好。

Exercise 9.5 - (g)

Fit a support vector classifier to the data with \(X_1\) and \(X_2\) as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm_L <- svm(y ~ ., data = dta, kernel = 'linear', cost = 0.01)
plot(svm_L, data = dta)

Exercise 9.5 - (h)

Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm_NL <- svm(y ~ ., data = dta, kernel = 'radial', gamma = 1)
plot(svm_NL, data = dta)

Exercise 9.5 - (i)

Comment on your results.

[ANS] SVM模型中,kernel為線性的模型表現也不好,在訓練資料集中就很明顯可以看出許多預測錯誤的資料,如果有切割出測試資料集,相信線性模型在測試資料集中的表現會更糟;而kernel為非線性的模型表現則好上許多。

Exercise 9.7

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

data(Auto)
Auto <- as.tibble(Auto)
theme_set(theme_tufte(base_size = 14))
set.seed(4028)
summary(Auto)
      mpg          cylinders      displacement     horsepower        weight    
 Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
 1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
 Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
 Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
 3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
 Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
                                                                               
  acceleration        year           origin                      name    
 Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
 1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
 Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
 Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
 3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
 Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
                                                 (Other)           :365  

Exercise 9.7 - (a)

Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

Auto <- Auto %>%
  mutate(mpg_dummy = as.integer(mpg > median(mpg))) %>% 
  mutate(mpg_dummy = factor(mpg_dummy), cylinders = factor(cylinders)) %>%
  dplyr::select(-mpg, -name)
dummy_trans <- dummyVars(mpg_dummy ~ ., data = Auto)
Auto_dummy <- predict(dummy_trans, Auto)
head(Auto_dummy)
  cylinders.3 cylinders.4 cylinders.5 cylinders.6 cylinders.8 displacement
1           0           0           0           0           1          307
2           0           0           0           0           1          350
3           0           0           0           0           1          318
4           0           0           0           0           1          304
5           0           0           0           0           1          302
6           0           0           0           0           1          429
  horsepower weight acceleration year origin
1        130   3504         12.0   70      1
2        165   3693         11.5   70      1
3        150   3436         11.0   70      1
4        150   3433         12.0   70      1
5        140   3449         10.5   70      1
6        198   4341         10.0   70      1

Exercise 9.7 - (b)

Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

Conduct 10-fold cross validation to find the best model

SVM_linear <- train(x = Auto_dummy, y = Auto$mpg_dummy,
                    method = 'svmLinear2',
                    trControl = trainControl(method = 'cv', number = 10, allowParallel = TRUE),
                    preProcess = c('center', 'scale'),
                    tuneGrid = expand.grid(cost = seq(1, 20, by = 1)))
SVM_linear$finalModel

Call:
svm.default(x = as.matrix(x), y = y, kernel = "linear", cost = param$cost, 
    probability = classProbs)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  2 

Number of Support Vectors:  75

Optain the performance of the final model

SVM_linear_finalModel <- svm_NL <- svm(Auto$mpg_dummy ~ ., data = Auto_dummy,
                                       kernel = 'linear', gamma = 1)
SVM_linear_pred <- predict(SVM_linear_finalModel, Auto_dummy)
table(Auto$mpg_dummy, SVM_linear_pred)
   SVM_linear_pred
      0   1
  0 178  18
  1   9 187
mean(Auto$mpg_dummy == SVM_linear_pred)
[1] 0.9311224

Exercise 9.7 - (c)

Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

Radial basis kernels

SVM_radial <- train(x = Auto_dummy, y = Auto$mpg_dummy,
                    method = 'svmRadial',
                    trControl = trainControl(method = 'cv', number = 10, allowParallel = TRUE),
                    preProcess = c('center', 'scale'),
                    tuneGrid = expand.grid(C = seq(0.001, 3, length.out = 10),
                                           sigma = seq(0.2, 2, length.out = 5)))
SVM_radial$finalModel
Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 0.334222222222222 

Gaussian Radial Basis kernel function. 
 Hyperparameter : sigma =  1.1 

Number of Support Vectors : 226 

Objective Function Value : -41.4342 
Training error : 0.038265 

Polynomial basis kernels

SVM_poly <- train(x = Auto_dummy, y = Auto$mpg_dummy,
                  method = 'svmPoly',
                  trControl = trainControl(method = 'cv', number = 10, allowParallel = TRUE),
                  preProcess = c('center', 'scale'),
                  tuneGrid = expand.grid(degree = seq(1, 8, by = 1),
                                         C = seq(1, 5, by = 1),
                                         scale = TRUE))
SVM_poly$finalModel
Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 2 

Polynomial kernel function. 
 Hyperparameters : degree =  1  scale =  TRUE  offset =  1 

Number of Support Vectors : 75 

Objective Function Value : -134.4977 
Training error : 0.071429 

Exercise 9.7 - (d)

Make some plots to back up your assertions in (b) and (c).

Hint: In the lab, we used the plot() function for svm objects only in cases with \(p = 2\). When \(p > 2\), you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing

plot(svmfit , dat)

where svmfit contains your fitted model and dat is a data frame containing your data, you can type

plot(svmfit , dat , x1∼x4)

in order to plot just the first and fourth variables. However, you must replace \(x_1\) and \(x_4\) with the correct variable names. To find out more, type ?plot.svm.

plot(SVM_linear)

plot(SVM_radial)

plot(SVM_poly)

mtx_results <- sapply(list(SVM_linear, SVM_radial, SVM_poly), function(MODEL) {
  postResample(predict(MODEL), Auto$mpg_dummy)
}) %>% t() %>% as.data.frame()
rownames(mtx_results) <- c('SVM_linear', 'SVM_radial', 'SVM_poly')
mtx_results