Assignment 8 - ISLR Chapter 9

Problem 5 - Part A

In this chunk I am generating a data set.

set.seed(2)
x1 <- runif(500) -0.5
x2 <- runif(500) -0.5
y <- 1*(x1^2-x2^2 > 0)

Problem 5 - Part B

In this chunk I am plotting the observations from part A.

plot(x1, x2, col=ifelse(y, "red", "black"))

Problem 5 - Part C

In this chunk, I am fitting a logistic regression model to the data.

glmFit <- glm(y~x1+x2, data=data.frame(x1,x2,y), family="binomial")
summary(glmFit)

## 
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial", data = data.frame(x1, 
##     x2, y))
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.271  -1.193   1.097   1.147   1.209  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.07138    0.08959   0.797    0.426
## x1          -0.03532    0.29825  -0.118    0.906
## x2           0.27548    0.30762   0.896    0.370
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.50  on 499  degrees of freedom
## Residual deviance: 691.67  on 497  degrees of freedom
## AIC: 697.67
## 
## Number of Fisher Scoring iterations: 3

Problem 5 - Part D

In the chunks below, I am applying the model to the training data to obtain predicted class labels for each training observation with a linear decision boundary.

glmPred <- predict(glmFit, data=data.frame(x1, x2))

plot(x1, x2, col=ifelse(glmPred>0, "black", "red"), pch=ifelse(as.integer(glmPred>0) == y,1,6))

Problem 5 - Part E

In this chunk, I am fitting a logistic regression model with non-linear functions (polynomial) of x1 and x2.

glmFit.2 <- glm(y~poly(x1,3)+poly(x2,3), data=data.frame(x1,x2,y), family="binomial")

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Problem 5 - Part F

In this chunk, I am plotting the model from part E to provide evidence that the decision boundary is non-linear.

glmPred.2 <- predict(glmFit.2, data=data.frame(x1,x2))

plot(x1, x2, col=ifelse(glmPred.2>0, "red", "black"), pch=ifelse(as.integer(glmPred>0) == y,1,6))

Problem 5 - Part G

In this chunk, I am using a support vector classifier to obtain class predictions for each training observation. The results seem to indicate that all class predictions were assigned to a single class.

svmFit <- svm(y~x1+x2, data=data.frame(x1,x2,y), cost=0.1, kernel="linear")
svmPred <- predict(svmFit, data.frame(x1,x2), type="response")

plot(x1, x2, col=ifelse(svmPred>0, "red", "black"), pch=ifelse(as.integer(svmPred>0) == y,1,6))

Problem 5 - Part H

In this chunk, I am fitting two support vector classifiers (one polynomial and one radial).

svmFit.2 <- svm(y~x1+x2, data=data.frame(x1,x2,y), degree=1, kernel="polynomial")
svmPred.2 <- predict(svmFit.2, data.frame(x1,x2), type="response")

plot(x1, x2, col=ifelse(svmPred.2>0, "red", "black"), pch=ifelse(as.integer(svmPred.2>0) == y,1,6))

svmFit.3 <- svm(y~x1+x2, data=data.frame(x1,x2,y), cost=1, kernel="radial")
svmPred.3 <- predict(svmFit.3)

plot(x1, x2, col=ifelse(svmPred.3>0, "red", "black"), pch=ifelse(as.integer(svmPred.3>0) == y,1,6))

Problem 5 - Part I

Out of the three models in part G and part H, it appears that the radial fit is the best since it has the least amount of misclassified observations and it assigns predictions to both classes.

Problem 7 - Part A

auto <- Auto
attach(auto)

In this chunk, I am creating the binary variable for median values.

mpgMed <- ifelse(mpg > median(mpg),1,0)
auto$mpgMed <- as.factor(mpgMed)

Problem 7 - Part B

set.seed(1)
svmTune <- tune(svm, mpgMed~., data=auto, ranges=list(cost = c(0.1, 0.2, 0.5, 1, 2, 10)), kernel="linear")

The results of the support vector classifier with cost = 0.1, 0.2, 0.5, 1, 2 and 10 indicate that the lowest cross-validation error is associated with the model containing cost = 1 which has an error rate of approximately 1.02%. Alternatively, the highest cross-validation error is associated with the model containing c = 0.1 which has an error rate of approximately 45.96%.

summary(svmTune)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.01025641 
## 
## - Detailed performance results:
##   cost      error dispersion
## 1  0.1 0.04596154 0.03378238
## 2  0.2 0.02814103 0.01893035
## 3  0.5 0.01282051 0.01813094
## 4  1.0 0.01025641 0.01792836
## 5  2.0 0.01282051 0.02179068
## 6 10.0 0.02051282 0.02648194

svmTune$best.model

## 
## Call:
## best.tune(method = svm, train.x = mpgMed ~ ., data = auto, ranges = list(cost = c(0.1, 
##     0.2, 0.5, 1, 2, 10)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
## 
## Number of Support Vectors:  56

Problem 7 - Part C

The results of the support vector classifier with the polynomial kernel indicate that the lowest cross-validation error is associated with model containing cost = 5 and degree = 1 which has an error rate of approximately 7.4%. The results of the support vector classifier with the radial kernel indicate that the lowest cross-validation error is associated with model containing cost = 3 and gamma = 1 which has an error rate of approximately 5.9%.

set.seed(1)
svmTune.pol <- tune(svm, mpgMed~., data=auto, ranges=list(cost=c(0.1, 0.4, 0.8, 1, 3, 5), degree=c(1,2,3)), kernel="polynomial")

summary(svmTune.pol)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##     5      1
## 
## - best performance: 0.07403846 
## 
## - Detailed performance results:
##    cost degree      error dispersion
## 1   0.1      1 0.19673077 0.11502574
## 2   0.4      1 0.09185897 0.04376958
## 3   0.8      1 0.08673077 0.04846618
## 4   1.0      1 0.08416667 0.04343030
## 5   3.0      1 0.07653846 0.03617137
## 6   5.0      1 0.07403846 0.03522110
## 7   0.1      2 0.55115385 0.04366593
## 8   0.4      2 0.55115385 0.04366593
## 9   0.8      2 0.55115385 0.04366593
## 10  1.0      2 0.55115385 0.04366593
## 11  3.0      2 0.55115385 0.04366593
## 12  5.0      2 0.55115385 0.04366593
## 13  0.1      3 0.55115385 0.04366593
## 14  0.4      3 0.55115385 0.04366593
## 15  0.8      3 0.55115385 0.04366593
## 16  1.0      3 0.55115385 0.04366593
## 17  3.0      3 0.55115385 0.04366593
## 18  5.0      3 0.55115385 0.04366593

svmTune.pol$best.model

## 
## Call:
## best.tune(method = svm, train.x = mpgMed ~ ., data = auto, ranges = list(cost = c(0.1, 
##     0.4, 0.8, 1, 3, 5), degree = c(1, 2, 3)), kernel = "polynomial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  5 
##      degree:  1 
##      coef.0:  0 
## 
## Number of Support Vectors:  132

set.seed(1)
svmTune.rad <- tune(svm, mpgMed~., data=auto, ranges=list(cost=c(0.1, 0.4, 0.8, 1, 3, 5), gamma=c(1,2,3)), kernel="radial")

summary(svmTune.rad)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##     3     1
## 
## - best performance: 0.05884615 
## 
## - Detailed performance results:
##    cost gamma      error dispersion
## 1   0.1     1 0.55115385 0.04366593
## 2   0.4     1 0.09935897 0.06326807
## 3   0.8     1 0.07147436 0.04312562
## 4   1.0     1 0.06384615 0.04375618
## 5   3.0     1 0.05884615 0.04020934
## 6   5.0     1 0.05884615 0.04020934
## 7   0.1     2 0.55115385 0.04366593
## 8   0.4     2 0.54608974 0.04574092
## 9   0.8     2 0.36743590 0.13801791
## 10  1.0     2 0.14019231 0.07984711
## 11  3.0     2 0.13512821 0.08055403
## 12  5.0     2 0.13512821 0.08055403
## 13  0.1     3 0.55115385 0.04366593
## 14  0.4     3 0.55115385 0.04366593
## 15  0.8     3 0.50006410 0.05856451
## 16  1.0     3 0.41326923 0.14331350
## 17  3.0     3 0.38025641 0.14908523
## 18  5.0     3 0.38025641 0.14908523

svmTune.rad$best.model

## 
## Call:
## best.tune(method = svm, train.x = mpgMed ~ ., data = auto, ranges = list(cost = c(0.1, 
##     0.4, 0.8, 1, 3, 5), gamma = c(1, 2, 3)), kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  3 
## 
## Number of Support Vectors:  377

Problem 7 - Part D

The three plots below are generated for the best SVM model with a linear basis kernel.

plot(svmTune$best.model, data=auto, mpg~horsepower)

plot(svmTune$best.model, data=auto, mpg~year)

plot(svmTune$best.model, data=auto, mpg~displacement)

The three plots below are generated for the best SVM model with a polynomial basis kernel.

plot(svmTune.pol$best.model, data=auto, mpg~horsepower)

plot(svmTune.pol$best.model, data=auto, mpg~year)

plot(svmTune.pol$best.model, data=auto, mpg~displacement)

The three plots below are generated for the best SVM model a radial basis kernel.

plot(svmTune.rad$best.model, data=auto, mpg~horsepower)

plot(svmTune.rad$best.model, data=auto, mpg~year)

plot(svmTune.rad$best.model, data=auto, mpg~displacement)

Problem 8 - Part A

oj <- OJ
attach(oj)

In this chunk, I am creating a training set of 800 observations and a test set of the remaining observations.

set.seed(1)
split <- sample(1:nrow(oj), 800)
ojTrain <- oj[split,]
ojTest <- oj[-split,]

Problem 8 - Part B

The results of the support vector classifier indicate that there are 435 support vectors out of the total 800 points in the training set, of which 219 belong to the class CH and 216 belong to the class MM.

ojSVM <- svm(Purchase~., data=ojTrain, cost=0.01, kernel="linear")
summary(ojSVM)

## 
## Call:
## svm(formula = Purchase ~ ., data = ojTrain, cost = 0.01, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  435
## 
##  ( 219 216 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Problem 8 - Part C

The training error rate is 17.5% and the test error rate is approximately 17.8%.

ojPred.train <- predict(ojSVM, ojTrain)
table(ojTrain$Purchase, ojPred.train)

##     ojPred.train
##       CH  MM
##   CH 420  65
##   MM  75 240

mean(ojPred.train != ojTrain$Purchase)

## [1] 0.175

ojPred.test <- predict(ojSVM, ojTest)
table(ojTest$Purchase, ojPred.test)

##     ojPred.test
##       CH  MM
##   CH 153  15
##   MM  33  69

mean(ojPred.test != ojTest$Purchase)

## [1] 0.1777778

Problem 8 - Part D

The results of the support vector classifier with cost set to a range of values between 1 and 10 indicate that the lowest cross-validation error is associated with the model containing cost = 0.5 which has an error rate of approximately 16.88%.

set.seed(1)
ojTune <- tune(svm, Purchase~., data=ojTrain, ranges=list(cost=c(0.1,0.2,0.5,0.8,1,2,3,5,8,10)), kernel="linear")

summary(ojTune)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.5
## 
## - best performance: 0.16875 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1   0.1 0.17250 0.03162278
## 2   0.2 0.17125 0.02829041
## 3   0.5 0.16875 0.02651650
## 4   0.8 0.16875 0.02779513
## 5   1.0 0.17500 0.02946278
## 6   2.0 0.17250 0.02874698
## 7   3.0 0.16875 0.03019037
## 8   5.0 0.17250 0.03162278
## 9   8.0 0.17375 0.03197764
## 10 10.0 0.17375 0.03197764

ojTune$best.model

## 
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = ojTrain, ranges = list(cost = c(0.1, 
##     0.2, 0.5, 0.8, 1, 2, 3, 5, 8, 10)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.5 
## 
## Number of Support Vectors:  332

Problem 8 - Part E

The training error rate for the best model from part D is 16.5% and the test error rate is approximately 15.6%.

ojPred.train.2 <- predict(ojTune$best.model, ojTrain)
table(ojTrain$Purchase, ojPred.train.2)

##     ojPred.train.2
##       CH  MM
##   CH 424  61
##   MM  71 244

mean(ojPred.train.2 != ojTrain$Purchase)

## [1] 0.165

ojPred.test.2 <- predict(ojTune$best.model, ojTest)
table(ojTest$Purchase, ojPred.test.2)

##     ojPred.test.2
##       CH  MM
##   CH 155  13
##   MM  29  73

mean(ojPred.test.2 != ojTest$Purchase)

## [1] 0.1555556

Problem 8 - Part F

The results of the support vector classifier with a radial kernel and cost set to 0.01 indicate that there are 634 support vectors out of the total 800 points in the training set, of which 319 belong to the class CH and 315 belong to the class MM.

ojSVM.rad <- svm(Purchase~., data=ojTrain, cost=0.01, kernel="radial")
summary(ojSVM.rad)

## 
## Call:
## svm(formula = Purchase ~ ., data = ojTrain, cost = 0.01, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.01 
## 
## Number of Support Vectors:  634
## 
##  ( 319 315 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

The training error rate is approximately 39.38% and the test error rate is approximately 37.78%.

ojPred.rad.train <- predict(ojSVM.rad, ojTrain)
table(ojTrain$Purchase, ojPred.rad.train)

##     ojPred.rad.train
##       CH  MM
##   CH 485   0
##   MM 315   0

mean(ojPred.rad.train != ojTrain$Purchase)

## [1] 0.39375

ojPred.rad.test <- predict(ojSVM.rad, ojTest)
table(ojTest$Purchase, ojPred.rad.test)

##     ojPred.rad.test
##       CH  MM
##   CH 168   0
##   MM 102   0

mean(ojPred.rad.test != ojTest$Purchase)

## [1] 0.3777778

The results of the support vector classifier with a radial kernel and cost set to a range of values between 1 and 10 indicate that the lowest cross-validation error is associated with the model containing cost = 0.5, which has an error rate of 16.75%.

set.seed(1)
ojTune.rad <- tune(svm, Purchase~., data=ojTrain, ranges=list(cost=c(0.1,0.2,0.5,0.8,1,2,3,5,8,10)), kernel="radial")

summary(ojTune.rad)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.5
## 
## - best performance: 0.1675 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1   0.1 0.18625 0.02853482
## 2   0.2 0.18250 0.03238227
## 3   0.5 0.16750 0.02443813
## 4   0.8 0.16875 0.02517301
## 5   1.0 0.17125 0.02128673
## 6   2.0 0.17750 0.02188988
## 7   3.0 0.17625 0.02239947
## 8   5.0 0.18000 0.02220485
## 9   8.0 0.18250 0.02648375
## 10 10.0 0.18625 0.02853482

ojTune.rad$best.model

## 
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = ojTrain, ranges = list(cost = c(0.1, 
##     0.2, 0.5, 0.8, 1, 2, 3, 5, 8, 10)), kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.5 
## 
## Number of Support Vectors:  407

The training error rate for the best model is 14.75% and the test error rate is approximately 17.78%.

ojPred.rad.train.2 <- predict(ojTune.rad$best.model, ojTrain)
table(ojTrain$Purchase, ojPred.rad.train.2)

##     ojPred.rad.train.2
##       CH  MM
##   CH 438  47
##   MM  71 244

mean(ojPred.rad.train.2 != ojTrain$Purchase)

## [1] 0.1475

ojPred.rad.test.2 <- predict(ojTune.rad$best.model, ojTest)
table(ojTest$Purchase, ojPred.rad.test.2)

##     ojPred.rad.test.2
##       CH  MM
##   CH 150  18
##   MM  30  72

mean(ojPred.rad.test.2 != ojTest$Purchase)

## [1] 0.1777778

Problem 8 - Part G

The results of the support vector classifier with a polynomial kernel, cost set to 0.01 and degree set to 2 indicate that there are 636 support vectors out of the total 800 points in the training set, of which 321 belong to the class CH and 315 belong to the class MM.

ojSVM.pol <- svm(Purchase~., data=ojTrain, cost=0.01, degree=2, kernel="polynomial")
summary(ojSVM.pol)

## 
## Call:
## svm(formula = Purchase ~ ., data = ojTrain, cost = 0.01, degree = 2, 
##     kernel = "polynomial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.01 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  636
## 
##  ( 321 315 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

The training error rate is 37.25% and the test error rate is approximately 36.67%.

ojPred.pol.train <- predict(ojSVM.pol, ojTrain)
table(ojTrain$Purchase, ojPred.pol.train)

##     ojPred.pol.train
##       CH  MM
##   CH 484   1
##   MM 297  18

mean(ojPred.pol.train != ojTrain$Purchase)

## [1] 0.3725

ojPred.pol.test <- predict(ojSVM.pol, ojTest)
table(ojTest$Purchase, ojPred.pol.test)

##     ojPred.pol.test
##       CH  MM
##   CH 167   1
##   MM  98   4

mean(ojPred.pol.test != ojTest$Purchase)

## [1] 0.3666667

The results of the support vector classifier with a polynomial kernel, cost set to a range of values between 1 and 10 and degree set to 2 indicate that the lowest cross-validation error is associated with the model containing cost = 3 and degree = 2, which has an error rate of approximately 17.63%.

set.seed(1)
ojTune.pol <- tune(svm, Purchase~., data=ojTrain, ranges=list(cost=c(0.1,0.2,0.5,0.8,1,2,3,5,8,10)), degree=2, kernel="polynomial")

summary(ojTune.pol)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     3
## 
## - best performance: 0.17625 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1   0.1 0.32125 0.05001736
## 2   0.2 0.22625 0.03557562
## 3   0.5 0.20625 0.04050463
## 4   0.8 0.20375 0.04251225
## 5   1.0 0.20250 0.04116363
## 6   2.0 0.18125 0.04177070
## 7   3.0 0.17625 0.03793727
## 8   5.0 0.18250 0.03496029
## 9   8.0 0.18000 0.03395258
## 10 10.0 0.18125 0.02779513

ojTune.pol$best.model

## 
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = ojTrain, ranges = list(cost = c(0.1, 
##     0.2, 0.5, 0.8, 1, 2, 3, 5, 8, 10)), degree = 2, kernel = "polynomial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  3 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  384

The training error rate for the best model is approximately 15.38% and the test error rate is approximately 20.37%.

ojPred.pol.train.2 <- predict(ojTune.pol$best.model, ojTrain)
table(ojTrain$Purchase, ojPred.pol.train.2)

##     ojPred.pol.train.2
##       CH  MM
##   CH 452  33
##   MM  90 225

mean(ojPred.pol.train.2 != ojTrain$Purchase)

## [1] 0.15375

ojPred.pol.test.2 <- predict(ojTune.pol$best.model, ojTest)
table(ojTest$Purchase, ojPred.pol.test.2)

##     ojPred.pol.test.2
##       CH  MM
##   CH 153  15
##   MM  40  62

mean(ojPred.pol.test.2 != ojTest$Purchase)

## [1] 0.2037037

Problem 8 - Part H

The results of the best model for each support vector classifier suggest that the classifier with a radial kernel provides the lowest training error rate but the classifier with the linear kernel provides the lowest test error rate. Given that the training and test error rates for each best model are relatively similar but the linear model performs best on test data, it appears a linear approach provides the best results for this scenario.

Assignment 8 - ISLR Chapter 9

Ryan Farias

8/4/2020