Exercise 5: We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

  1. Generate a data set with n =500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them.
set.seed(1)

x1 = runif(500) - 0.5
x2 = runif(500) - 0.5
y = 1 * (x1^2 - x2^2 > 0)
  1. Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y-axis.
plot(x1,x2,col=ifelse(y,'red','navy'),xlab='X1',ylab='X2')

  1. Fit a logistic regression model to the data, using X1 and X2 as predictors.
dat = data.frame(x1, x2, y = as.factor(y))

glm.fit = glm(y~., data = dat, family = "binomial")
summary(glm.fit)
## 
## Call:
## glm(formula = y ~ ., family = "binomial", data = dat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.179  -1.139  -1.112   1.206   1.257  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.087260   0.089579  -0.974    0.330
## x1           0.196199   0.316864   0.619    0.536
## x2          -0.002854   0.305712  -0.009    0.993
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.18  on 499  degrees of freedom
## Residual deviance: 691.79  on 497  degrees of freedom
## AIC: 697.79
## 
## Number of Fisher Scoring iterations: 3
  1. Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
glm.preds = predict(glm.fit, newdata = dat, type ="response")
plot(x1,x2,col=ifelse(glm.preds>=0.5,'navy','red'),xlab='X1',ylab='X2')

  1. Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X12, X1 ×X2, log(X2), and so forth).
glm.fit2 = glm(y~I(x1*x2) + poly(x2,2) + poly(x1,2), data=dat, family = "binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
  1. Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
glm.preds2 = predict(glm.fit2, newdata = dat, type ="response")
plot(x1,x2,col=ifelse(glm.preds2>=0.5,'navy','red'),xlab='X1',ylab='X2')

  1. Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm.fit=svm(y~.,data=dat,kernal='linear',cost=0.01)
svm.preds=predict(svm.fit,newdata=dat,type='response')
plot(x1,x2,col=ifelse(svm.preds!=0,'navy','red'),xlab='X1',ylab='X2')

  1. Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
svm.fit2=svm(y~.,data=dat,kernel='radial',gamma=1)
svm.preds2=predict(svm.fit2,newdata=dat,type='response')
plot(x1,x2,col=ifelse(svm.preds2!=0,'navy','red'),xlab='X1',ylab='X2')

  1. Comment on your results.

Exercise 7: In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

  1. Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
df = Auto
df$y = as.factor(ifelse(df$mpg > median(df$mpg), 1, 0))
str(df)
## 'data.frame':    392 obs. of  10 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : num  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
##  $ y           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
  1. Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter.
set.seed(42)
tune.out = tune(svm, y ~ . -mpg -name, data=df, kernel="linear", ranges=list(cost=c(0.1, 1, 10)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.08653846 
## 
## - Detailed performance results:
##   cost      error dispersion
## 1  0.1 0.09673077 0.05699840
## 2  1.0 0.09423077 0.04632467
## 3 10.0 0.08653846 0.03776796
  1. Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. train(target~ months_since_donate + num_child + last_gift, data=train, method=‘knn’, trControl = train_control, tuneLength=30)
set.seed(42)
tune.out = tune(svm, y ~ . -mpg -name, data=df, kernel="radial", ranges=list(cost=c(0.1, 1, 10), gamma=c(0.5, 1, 2)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##     1   0.5
## 
## - best performance: 0.07628205 
## 
## - Detailed performance results:
##   cost gamma      error dispersion
## 1  0.1   0.5 0.08660256 0.04961909
## 2  1.0   0.5 0.07628205 0.04267196
## 3 10.0   0.5 0.08634615 0.04391746
## 4  0.1   1.0 0.09673077 0.05699840
## 5  1.0   1.0 0.07878205 0.04472958
## 6 10.0   1.0 0.09403846 0.04383004
## 7  0.1   2.0 0.15269231 0.10000813
## 8  1.0   2.0 0.08378205 0.04837755
## 9 10.0   2.0 0.10173077 0.05012408
  1. Make some plots to back up your assertions in (b) and (c).
df = df[, -c(1,9)]

set.seed(42)
svm_fit = svm(y ~ ., data=df, kernel='linear', cost=10)
plot(svm_fit, df, displacement~cylinders)

set.seed(42)
svm_fit = svm(y ~ ., data=df, kernel='radial', cost=1, gamma=0.5)
plot(svm_fit, df, weight~acceleration)

set.seed(42)
svm_fit = svm(y ~ ., data=df, kernel='polynomial', cost=10, degree=3)
plot(svm_fit, df, year~horsepower)

Exercise 8: This problem involves the OJ data set which is part of the ISLR package.

  1. Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
df = OJ

df$Purchase = as.factor(df$Purchase)

set.seed(42)
index = sample(nrow(df), 800)
train = df[index, ]
test = df[-index, ]
  1. Fit a support vector classifier to the training data using cost=0.01,with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
svm_fit = svm(Purchase~., data=train, kernel='linear', cost=0.01)
summary(svm_fit)
## 
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  432
## 
##  ( 215 217 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
  1. What are the training and test error rates?
svm_preds = predict(svm_fit, train)
confusionMatrix(data=svm_preds, reference=train$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 432  77
##         MM  60 231
##                                           
##                Accuracy : 0.8288          
##                  95% CI : (0.8008, 0.8542)
##     No Information Rate : 0.615           
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6346          
##                                           
##  Mcnemar's Test P-Value : 0.1716          
##                                           
##             Sensitivity : 0.8780          
##             Specificity : 0.7500          
##          Pos Pred Value : 0.8487          
##          Neg Pred Value : 0.7938          
##              Prevalence : 0.6150          
##          Detection Rate : 0.5400          
##    Detection Prevalence : 0.6362          
##       Balanced Accuracy : 0.8140          
##                                           
##        'Positive' Class : CH              
## 
svm_preds = predict(svm_fit, test)
confusionMatrix(data=svm_preds, reference=test$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 142  25
##         MM  19  84
##                                          
##                Accuracy : 0.837          
##                  95% CI : (0.7875, 0.879)
##     No Information Rate : 0.5963         
##     P-Value [Acc > NIR] : <2e-16         
##                                          
##                   Kappa : 0.6585         
##                                          
##  Mcnemar's Test P-Value : 0.451          
##                                          
##             Sensitivity : 0.8820         
##             Specificity : 0.7706         
##          Pos Pred Value : 0.8503         
##          Neg Pred Value : 0.8155         
##              Prevalence : 0.5963         
##          Detection Rate : 0.5259         
##    Detection Prevalence : 0.6185         
##       Balanced Accuracy : 0.8263         
##                                          
##        'Positive' Class : CH             
## 
  1. Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
set.seed(42)
tune.out = tune(svm, Purchase~., data=train, kernel="linear", ranges=list(cost=c(0.1, 1, 10)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.175 
## 
## - Detailed performance results:
##   cost   error dispersion
## 1  0.1 0.17625 0.03356689
## 2  1.0 0.17500 0.02886751
## 3 10.0 0.18625 0.02729087
  1. Compute the training and test error rates using this new value for cost.
svm_fit = svm(Purchase~., data=train, kernel='linear', cost=1)

#train
svm_preds = predict(svm_fit, train)
confusionMatrix(data=svm_preds, reference=train$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 434  76
##         MM  58 232
##                                           
##                Accuracy : 0.8325          
##                  95% CI : (0.8048, 0.8577)
##     No Information Rate : 0.615           
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6424          
##                                           
##  Mcnemar's Test P-Value : 0.1419          
##                                           
##             Sensitivity : 0.8821          
##             Specificity : 0.7532          
##          Pos Pred Value : 0.8510          
##          Neg Pred Value : 0.8000          
##              Prevalence : 0.6150          
##          Detection Rate : 0.5425          
##    Detection Prevalence : 0.6375          
##       Balanced Accuracy : 0.8177          
##                                           
##        'Positive' Class : CH              
## 
# test
svm_preds = predict(svm_fit, test)
confusionMatrix(data=svm_preds, reference=test$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 140  23
##         MM  21  86
##                                          
##                Accuracy : 0.837          
##                  95% CI : (0.7875, 0.879)
##     No Information Rate : 0.5963         
##     P-Value [Acc > NIR] : <2e-16         
##                                          
##                   Kappa : 0.6605         
##                                          
##  Mcnemar's Test P-Value : 0.8802         
##                                          
##             Sensitivity : 0.8696         
##             Specificity : 0.7890         
##          Pos Pred Value : 0.8589         
##          Neg Pred Value : 0.8037         
##              Prevalence : 0.5963         
##          Detection Rate : 0.5185         
##    Detection Prevalence : 0.6037         
##       Balanced Accuracy : 0.8293         
##                                          
##        'Positive' Class : CH             
## 
  1. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
svm_fit = svm(Purchase~., data=train, kernel='radial')
summary(svm_fit)
## 
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  375
## 
##  ( 183 192 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
# train
svm_preds = predict(svm_fit, train)
confusionMatrix(data=svm_preds, reference=train$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 453  81
##         MM  39 227
##                                          
##                Accuracy : 0.85           
##                  95% CI : (0.8233, 0.874)
##     No Information Rate : 0.615          
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.675          
##                                          
##  Mcnemar's Test P-Value : 0.000182       
##                                          
##             Sensitivity : 0.9207         
##             Specificity : 0.7370         
##          Pos Pred Value : 0.8483         
##          Neg Pred Value : 0.8534         
##              Prevalence : 0.6150         
##          Detection Rate : 0.5663         
##    Detection Prevalence : 0.6675         
##       Balanced Accuracy : 0.8289         
##                                          
##        'Positive' Class : CH             
## 
# test
svm_preds = predict(svm_fit, test)
confusionMatrix(data=svm_preds, reference=test$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 146  28
##         MM  15  81
##                                           
##                Accuracy : 0.8407          
##                  95% CI : (0.7915, 0.8823)
##     No Information Rate : 0.5963          
##     P-Value [Acc > NIR] : < 2e-16         
##                                           
##                   Kappa : 0.6627          
##                                           
##  Mcnemar's Test P-Value : 0.06725         
##                                           
##             Sensitivity : 0.9068          
##             Specificity : 0.7431          
##          Pos Pred Value : 0.8391          
##          Neg Pred Value : 0.8438          
##              Prevalence : 0.5963          
##          Detection Rate : 0.5407          
##    Detection Prevalence : 0.6444          
##       Balanced Accuracy : 0.8250          
##                                           
##        'Positive' Class : CH              
## 
set.seed(42)
tune.out = tune(svm, Purchase~., data=train, kernel="radial", ranges=list(gamma=c(0.5, 1, 2)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma
##    0.5
## 
## - best performance: 0.19375 
## 
## - Detailed performance results:
##   gamma   error dispersion
## 1   0.5 0.19375 0.04299952
## 2   1.0 0.20250 0.04158325
## 3   2.0 0.21750 0.04257347
svm_fit = svm(Purchase~., data=train, kernel='radial', gamma=0.5)

# train
svm_preds = predict(svm_fit, train)
confusionMatrix(data=svm_preds, reference=train$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 457  74
##         MM  35 234
##                                          
##                Accuracy : 0.8638         
##                  95% CI : (0.838, 0.8868)
##     No Information Rate : 0.615          
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.7053         
##                                          
##  Mcnemar's Test P-Value : 0.0002729      
##                                          
##             Sensitivity : 0.9289         
##             Specificity : 0.7597         
##          Pos Pred Value : 0.8606         
##          Neg Pred Value : 0.8699         
##              Prevalence : 0.6150         
##          Detection Rate : 0.5713         
##    Detection Prevalence : 0.6637         
##       Balanced Accuracy : 0.8443         
##                                          
##        'Positive' Class : CH             
## 
# test
svm_preds = predict(svm_fit, test)
confusionMatrix(data=svm_preds, reference=test$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 144  32
##         MM  17  77
##                                           
##                Accuracy : 0.8185          
##                  95% CI : (0.7673, 0.8626)
##     No Information Rate : 0.5963          
##     P-Value [Acc > NIR] : 3.798e-15       
##                                           
##                   Kappa : 0.6145          
##                                           
##  Mcnemar's Test P-Value : 0.0455          
##                                           
##             Sensitivity : 0.8944          
##             Specificity : 0.7064          
##          Pos Pred Value : 0.8182          
##          Neg Pred Value : 0.8191          
##              Prevalence : 0.5963          
##          Detection Rate : 0.5333          
##    Detection Prevalence : 0.6519          
##       Balanced Accuracy : 0.8004          
##                                           
##        'Positive' Class : CH              
## 
  1. Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
svm_fit = svm(Purchase~., data=train, kernel='polynomial', degree=2)
summary(svm_fit)
## 
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "polynomial", 
##     degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  443
## 
##  ( 217 226 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
# train
svm_preds = predict(svm_fit, train)
confusionMatrix(data=svm_preds, reference=train$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 461 112
##         MM  31 196
##                                           
##                Accuracy : 0.8212          
##                  95% CI : (0.7929, 0.8472)
##     No Information Rate : 0.615           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.603           
##                                           
##  Mcnemar's Test P-Value : 2.233e-11       
##                                           
##             Sensitivity : 0.9370          
##             Specificity : 0.6364          
##          Pos Pred Value : 0.8045          
##          Neg Pred Value : 0.8634          
##              Prevalence : 0.6150          
##          Detection Rate : 0.5763          
##    Detection Prevalence : 0.7163          
##       Balanced Accuracy : 0.7867          
##                                           
##        'Positive' Class : CH              
## 
# test
svm_preds = predict(svm_fit, test)
confusionMatrix(data=svm_preds, reference=test$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 149  41
##         MM  12  68
##                                           
##                Accuracy : 0.8037          
##                  95% CI : (0.7512, 0.8494)
##     No Information Rate : 0.5963          
##     P-Value [Acc > NIR] : 2.768e-13       
##                                           
##                   Kappa : 0.574           
##                                           
##  Mcnemar's Test P-Value : 0.00012         
##                                           
##             Sensitivity : 0.9255          
##             Specificity : 0.6239          
##          Pos Pred Value : 0.7842          
##          Neg Pred Value : 0.8500          
##              Prevalence : 0.5963          
##          Detection Rate : 0.5519          
##    Detection Prevalence : 0.7037          
##       Balanced Accuracy : 0.7747          
##                                           
##        'Positive' Class : CH              
## 
set.seed(42)
tune.out = tune(svm, Purchase~., data=train, kernel="polynomial", ranges=list(degree=c(1, 2, 3)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  degree
##       1
## 
## - best performance: 0.18125 
## 
## - Detailed performance results:
##   degree   error dispersion
## 1      1 0.18125 0.03498512
## 2      2 0.19250 0.04216370
## 3      3 0.19625 0.03586723
svm_fit = svm(Purchase~., data=train, kernel='polynomial', degree=1)

# train
svm_preds = predict(svm_fit, train)
confusionMatrix(data=svm_preds, reference=train$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 432  74
##         MM  60 234
##                                           
##                Accuracy : 0.8325          
##                  95% CI : (0.8048, 0.8577)
##     No Information Rate : 0.615           
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6433          
##                                           
##  Mcnemar's Test P-Value : 0.2614          
##                                           
##             Sensitivity : 0.8780          
##             Specificity : 0.7597          
##          Pos Pred Value : 0.8538          
##          Neg Pred Value : 0.7959          
##              Prevalence : 0.6150          
##          Detection Rate : 0.5400          
##    Detection Prevalence : 0.6325          
##       Balanced Accuracy : 0.8189          
##                                           
##        'Positive' Class : CH              
## 
# test
svm_preds = predict(svm_fit, test)
confusionMatrix(data=svm_preds, reference=test$Purchase)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CH  MM
##         CH 141  23
##         MM  20  86
##                                           
##                Accuracy : 0.8407          
##                  95% CI : (0.7915, 0.8823)
##     No Information Rate : 0.5963          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6677          
##                                           
##  Mcnemar's Test P-Value : 0.7604          
##                                           
##             Sensitivity : 0.8758          
##             Specificity : 0.7890          
##          Pos Pred Value : 0.8598          
##          Neg Pred Value : 0.8113          
##              Prevalence : 0.5963          
##          Detection Rate : 0.5222          
##    Detection Prevalence : 0.6074          
##       Balanced Accuracy : 0.8324          
##                                           
##        'Positive' Class : CH              
## 
  1. Overall, which approach seems to give the best results on this data?