PROBLEM 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

  1. Generating a dataset
set.seed(1)
x1 = runif(500) - 0.5
x2 = runif(500) - 0.5
y = 1*(x1^2 - x2^2 > 0)
  1. Plotting the observations
plot(x1, x2, col = y+2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)
title("Plotting the data simulation")

  1. Logistic Regression Model
fit.logit<-glm(y~x1+x2, family=binomial)
summary(fit.logit)
## 
## Call:
## glm(formula = y ~ x1 + x2, family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.179  -1.139  -1.112   1.206   1.257  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.087260   0.089579  -0.974    0.330
## x1           0.196199   0.316864   0.619    0.536
## x2          -0.002854   0.305712  -0.009    0.993
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.18  on 499  degrees of freedom
## Residual deviance: 691.79  on 497  degrees of freedom
## AIC: 697.79
## 
## Number of Fisher Scoring iterations: 3
  1. Applying the model to training data and plotting it.
logit.probs<-predict(fit.logit, type="response")
logit.pred<-rep(0, length(logit.probs))
logit.pred[logit.probs> 0.50] = 1
table(logit.pred, y)
##           y
## logit.pred   0   1
##          0 258 212
##          1   3  27
mean(logit.pred != y)
## [1] 0.43
plot(x1, x2, col = logit.pred + 2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)

The decision boundary is indeed linear.

  1. Fitting the logistic regression model using non-linear functions of X1 and X2
fit.logit2 = glm(y~ poly(x1, 2) + poly(x2, 2) + I(x1*x2), family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
logit.probs2 = predict(fit.logit2, type = "response")
logit.pred2 = rep(0, length(logit.probs2))
logit.pred2[logit.probs2 > 0.50] = 1
table(logit.pred2, y)
##            y
## logit.pred2   0   1
##           0 261   0
##           1   0 239
mean(logit.pred2 != y)
## [1] 0

The data now fits perfectly with a training error rate of 0%

  1. Applying the model to training model to obtain a predicted class label for each training observation.
plot(x1, x2, col = logit.pred2 + 2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)

The decision boundary is now non-linear.

  1. Support Vector Classifier
library(e1071)
dat = data.frame(x2 = x2, x1 = x1, y = as.factor(y))
svmfit.linear = svm(y~., data = dat, kernel = "linear")
plot(svmfit.linear, data = dat)

  1. SVM using non-linear kernel
svmfit.rad = svm(y~., data = data.frame(x1 = x1, x2 = x2, y = as.factor(y)), kernel = "radial" ,gamma = 1, cost = 1)
plot(svmfit.rad, data = dat)

  1. The relationship between predictors and response is non-linear. Hence, SVM with non-linear kernel outperforms SVM with linear kernel in fitting the data.

PROBLEM 8

This problem involves the OJ data set which is part of the ISLR package.

  1. Creating a training set
library(ISLR)
library(e1071)
attach(OJ)
set.seed(1)
train = sample(1:nrow(OJ), 800)
train.Y = Purchase[train]
test.Y = Purchase[-train]
test.X = OJ[-train, ]
  1. Fitting a support vector classifier to the training data
svmfit = svm(Purchase~., data = OJ[train, ], kernel = "linear", cost = 0.01)
summary(svmfit)
## 
## Call:
## svm(formula = Purchase ~ ., data = OJ[train, ], kernel = "linear", 
##     cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  432
## 
##  ( 215 217 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
  1. Error Rates
train.pred<-predict(svmfit)
mean(train.pred != train.Y)
## [1] 0.16625
test.pred = predict(svmfit, newdata = test.X)
mean(test.pred != test.Y)
## [1] 0.1814815

train error = 0.16625 test error = 0.1814815

  1. tune function to select an optimal cost.
set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "linear", ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.15625 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.16250 0.04124790
## 2  0.03 0.15750 0.04533824
## 3  0.10 0.16000 0.04401704
## 4  0.30 0.16000 0.04158325
## 5  1.00 0.15625 0.04611655
## 6  3.00 0.16125 0.04619178
## 7 10.00 0.16625 0.05036326
  1. Error Rates
train_pred.linear = predict(tune.out$best.model)
mean(train_pred.linear != train.Y)
## [1] 0.15875
test_pred.linear = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.linear != test.Y)
## [1] 0.1925926

train error = 0.15875 test error = 0.1925926

  1. SVM, Radial Kernel
set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "radial", ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.3
## 
## - best performance: 0.17 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.38250 0.04048319
## 2  0.03 0.38000 0.04417453
## 3  0.10 0.17875 0.05036326
## 4  0.30 0.17000 0.04721405
## 5  1.00 0.17125 0.04210189
## 6  3.00 0.17500 0.04208127
## 7 10.00 0.17250 0.03670453
train_pred.rad = predict(tune.out$best.model)
mean(train_pred.rad != train.Y)
## [1] 0.155
test_pred.rad = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.rad != test.Y)
## [1] 0.162963

train error rate = 0.155 test error rate = 0.162963

  1. Polynomial Kernel, degrees =2
set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "polynomial", degree = 2, ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.1775 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.38250 0.04048319
## 2  0.03 0.36250 0.04639804
## 3  0.10 0.32625 0.05787019
## 4  0.30 0.22625 0.03508422
## 5  1.00 0.19375 0.04573854
## 6  3.00 0.18875 0.03884174
## 7 10.00 0.17750 0.02993047
train_pred.poly = predict(tune.out$best.model)
mean(train_pred.poly != train.Y)
## [1] 0.145
test_pred.poly = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.poly != test.Y)
## [1] 0.1851852

train error = 0.145 test error = 0.1851852

  1. Radial SVM gives the best results.