Question 5

We can seen that we can fit an SVM with a non linear kernel inorder to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non linear decision boundary by performing logistic regression using non-linear transformations of the feature

Question 5a

Generate a data set with n=500 and p=2, such that the observation belong to two classes with the quadratic decision boundary between them.For instance, you can do this as follows.

x1=runif(500)-0.5
x2=runif(500)-0.5
y=1*(x1^2-x2^2>0)

Question 5b

Plot the observations,colored according to their class lables.Your plot should display X1 on the x-axis and x2 on the y axis

test.data=data.frame(x1=x1,x2=x2,y=as.factor(y))
plot(x1,x2,col=(2-y))

## Question 5c

Fit a logistic regression model to the data, using x1 and x2 as predictors

test.data.glm.fit=glm(y~x1+x2,data = test.data,family = "binomial")

Question 5d

Apply this model to the training data in order to obtain a predicted class label for each traing observation. plot the obeservations, colored according to the predicted class labels. the decision boundary should be linear.

test.data.glm.probs = predict(test.data.glm.fit, newdata=test.data, type = 'response')
test.data.glm.preds = rep(0,500)
test.data.glm.preds[test.data.glm.probs>0.50] = 1
test.data.confm=table(preds=test.data.glm.preds, truth=test.data$y)
plot(x1,x2,col=2-test.data.glm.preds)

Calculate the error rate

(test.data.errorrate=(test.data.confm[1,2]+test.data.confm[2,1])/nrow(test.data))
## [1] 0.354

*** As per the plot the decision boundary is linear. At the same time error rate is high 43% for the dataset contains only 500 records***

Question 9e

Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors

test.data.glm.fit1= glm(y~I(x1^2)+I(x2^2),data=test.data,family="binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Question 9f

Apply this model to the training data in order to obtain a predicted class label for each training observations. plot the observations, colored according to the predicted class labels. the decision boundary should be non-linear.

test.data.glm1.probs = predict(test.data.glm.fit1, newdata=test.data, type = 'response')
test.data.glm1.preds = rep(0,500)
test.data.glm1.preds[test.data.glm1.probs>0.50] = 1
test.data.confm1=table(preds=test.data.glm1.preds, truth=test.data$y)
plot(x1,x2,col=2-test.data.glm1.preds)

(test.data.errorrate1=(test.data.confm1[1,2]+test.data.confm1[2,1])/nrow(test.data))
## [1] 0

Question 9g

Fit a support vector classifier to the data with X1 and X2 as predictors.obtain a predicted class label for each training observations. plot the observations, colored according to the predicted class labels.

test.data.svml.fit= svm(y~.,data = test.data,kernel="linear",cost=10,scale=FALSE)
test.data.svml.pred = predict(test.data.svml.fit, newdata=test.data, type = 'response')
plot(x1,x2,col=test.data.svml.pred)

test.data.svml.confm=table(preds=test.data.svml.pred, truth=test.data$y)
(test.data.svml.errorrate=(test.data.svml.confm[1,2]+test.data.svml.confm[2,1])/nrow(test.data))
## [1] 0.474

Question 9h

Fit a SVM using non linear kernel to the data with X1 and X2 as predictors.obtain a predicted class label for each training observations. plot the obeservations, colored according to the predicted class labels.

test.data.svmr.fit= svm(y~.,data = test.data,kernel="radial",gamma=1,cost=10,scale=FALSE)
test.data.svmr.pred = predict(test.data.svmr.fit, newdata=test.data, type = 'response')
plot(x1,x2,col=test.data.svmr.pred)

test.data.svmr.confm=table(preds=test.data.svmr.pred, truth=test.data$y)
(test.data.svmr.errorrate=(test.data.svmr.confm[1,2]+test.data.svmr.confm[2,1])/nrow(test.data))
## [1] 0.022

Question 5i

Comment on your results

  • Based on the results, Logistic regression using non linear function and Support vector machines with non linear decision boundary leads to a greater reduction in the sensitivity compared to the Logistic regression with linear decision boundary and support vector classification with linear decision boundary.***

Question 7

In this problem you will use support vector approaches inorder to predict whether a given car gets high or low mileage based on the auto data set.

data(Auto)

Question 7a

Create a binary varaible that take on a 1 for a cars with gas mileage above the mdeian and a 0 for cars with the gas mileage below the median.

set.seed(222)
auto.length = length(Auto$mpg)
auto.mpg.median = median(Auto$mpg)
Auto.mpg01=rep(0,auto.length)
for (i in 1:auto.length) if (Auto$mpg[i] > auto.mpg.median) Auto.mpg01[i]=1 else Auto.mpg01[i]=0
auto.new= data.frame(Auto,Auto.mpg01)

auto.new$Auto.mpg01=as.factor(auto.new$Auto.mpg01)
str(auto.new)
## 'data.frame':    392 obs. of  10 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : num  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
##  $ Auto.mpg01  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

Question 7b

fit a support vector classifier to the data with various values of cost, in order to predict whether car gets high or low gas mileage

autolinear.tune=tune(svm,Auto.mpg01~.,data=auto.new,kernel='linear',
                 ranges=list(cost=c(0.001,0.01,0.1,1,5,10,100,1000)))
summary(autolinear.tune)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.01275641 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-03 0.09955128 0.04760888
## 2 1e-02 0.07666667 0.04375200
## 3 1e-01 0.04596154 0.02359743
## 4 1e+00 0.01275641 0.01808165
## 5 5e+00 0.01532051 0.01318724
## 6 1e+01 0.01788462 0.01234314
## 7 1e+02 0.03057692 0.01606420
## 8 1e+03 0.03057692 0.01606420

As per the summary the best performance occurs at the cost =1.0, after that as the cost increases the error also started to decreases

Question 7c:

Now repeat(b), this time using svm with radial and polinomial basis kernel with differenr values of gamma and degree of cost.

set.seed(222)
# Radial kernel with various values of gamma and cost.
auto.radial.tune=tune(svm, Auto.mpg01~., data=auto.new, kernel='radial',
                 ranges=list(cost=c(0.1,1,10,100,1000),gamma=c(0.5,1,2,3,4)))
auto.radial.tune$best.parameters
##   cost gamma
## 2    1   0.5
auto.radial.tune$best.performance
## [1] 0.04602564

The training CV error is lowest for a radial model with cost=1 and gamma=0.5, but the value is around 4x higher than for the linear model.

# Polynomial kernel with various degrees
set.seed(222)
auto.poly.tune = tune(svm, Auto.mpg01~., data=auto.new, kernel='polynomial',
                 ranges=list(cost=c(0.1,1,10,100,1000), degree=c(1,2,3,4,5)))
auto.poly.tune$best.parameters
##   cost degree
## 5 1000      1
auto.poly.tune$best.performance
## [1] 0.01019231

The best polynomial model is with degree=1 and cost=1000.The lowest training CV errors are given by the linear SVM and polynomial with degree=1, and this suggest the true decision boundary is linear.Need to test these models on a test set to properly ascertain which of the models is the best.

Question 8

This problem involves the OJ dataset which is part of the ISLR package

Question 8a

Create a training set containing a random sample of 800 observations and a test set containing the remaining observations

set.seed(131)
# Training and test sets.
sample.data = sample.split(OJ$Purchase,SplitRatio = 0.8)
train.set = subset(OJ, sample.data==T)
test.set = subset(OJ, sample.data==F)

Question 8b

Fit a support vector classifer to the training data using cost=0.01, with purchase as the response and the other varaiables as predictors.use the summary function to produce the statistics and describe the results obtained.

svmfit = svm(Purchase~., data = train.set, kernel = "linear", cost=0.01)
summary(svmfit)
## 
## Call:
## svm(formula = Purchase ~ ., data = train.set, kernel = "linear", 
##     cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  461
## 
##  ( 230 231 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Based on the summary detail, the number of supporting vector is 461 which makes the maximized margin distance

Question 8c

what are the training and test error rates

svm.pred = predict(svmfit, train.set)
linear.train.confm=table(predict=svm.pred, truth=train.set$Purchase)

Test error rate

(linear.train.errorrate=(linear.train.confm[1,2]+linear.train.confm[2,1])/length(train.set))
## [1] 7.833333
svm.pred.test = predict(svmfit, test.set)
linear.test.confm=table(predict=svm.pred.test, truth=test.set$Purchase)

Test error rate

(linear.test.errorrate=(linear.test.confm[1,2]+linear.test.confm[2,1])/length(test.set))
## [1] 2.111111

Question 8d

use the tune() function to select the optimal cost. consider values inthe range 0.01 to 10

set.seed(131)
tune.out = tune(svm, Purchase~., data = train.set, kernel = "linear", 
                ranges=list(cost=c(0.01,0.1,0.5,1,10)))
tune.out$best.parameters
##   cost
## 1 0.01
tune.out$best.model
## 
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = train.set, 
##     ranges = list(cost = c(0.01, 0.1, 0.5, 1, 10)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  461

Question 8e

svm.pred.bm = predict(tune.out$best.model, train.set)
linear.train.confm.bm=table(predict=svm.pred.bm, truth=train.set$Purchase)

Test error rate

(linear.train.errorrate.bm=(linear.train.confm.bm[1,2]+linear.train.confm.bm[2,1])/length(train.set))
## [1] 7.833333
svm.pred.test.bm = predict(tune.out$best.model, test.set)
linear.test.confm.bm=table(predict=svm.pred.test.bm, truth=test.set$Purchase)

Test error rate

(linear.test.errorrate.bm=(linear.test.confm.bm[1,2]+linear.test.confm.bm[2,1])/length(test.set))
## [1] 2.111111

Question 8f

Repeat parts(b) through(c) using a support vector machine with the radial kernel use the default value for gamma

set.seed(131)
radial.tune.out = tune(svm, Purchase~., data = train.set, kernel = "radial", 
                ranges=list(cost=c(0.01,0.1,0.5,1,10)))
radial.tune.out$best.parameters
##   cost
## 4    1
radial.tune.out$best.model
## 
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = train.set, 
##     ranges = list(cost = c(0.01, 0.1, 0.5, 1, 10)), kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  388
svm.radial.pred.bm = predict(radial.tune.out$best.model, train.set)
radial.train.confm.bm=table(predict=svm.radial.pred.bm, truth=train.set$Purchase)

Test error rate

(radial.train.errorrate.bm=(radial.train.confm.bm[1,2]+radial.train.confm.bm[2,1])/length(train.set))
## [1] 7
svm.radial.pred.test.bm = predict(radial.tune.out$best.model, test.set)
radial.test.confm.bm=table(predict=svm.radial.pred.test.bm, truth=test.set$Purchase)

Test error rate

(radial.test.errorrate.bm=(radial.test.confm.bm[1,2]+radial.test.confm.bm[2,1])/length(test.set))
## [1] 2.166667

Question 8g

Repeat parts(b) through(c) using a support vector machine with the poly kernel use the default value for gamma

set.seed(131)
poly.tune.out = tune(svm, Purchase~., data = train.set, kernel = "poly", 
                ranges=list(cost=c(0.01,0.1,0.5,1,10)),degree=2)
poly.tune.out$best.parameters
##   cost
## 5   10
poly.tune.out$best.model
## 
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = train.set, 
##     ranges = list(cost = c(0.01, 0.1, 0.5, 1, 10)), kernel = "poly", 
##     degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  10 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  357
svm.poly.pred.bm = predict(poly.tune.out$best.model, train.set)
poly.train.confm.bm=table(predict=svm.poly.pred.bm, truth=train.set$Purchase)

Test error rate

(poly.train.errorrate.bm=(poly.train.confm.bm[1,2]+poly.train.confm.bm[2,1])/length(train.set))
## [1] 7
svm.poly.pred.test.bm = predict(poly.tune.out$best.model, test.set)
poly.test.confm.bm=table(predict=svm.poly.pred.test.bm, truth=test.set$Purchase)

Test error rate

(poly.test.errorrate.bm=(poly.test.confm.bm[1,2]+poly.test.confm.bm[2,1])/length(test.set))
## [1] 2.277778

Question 8h

Overall which approach seems to give the best results on the data

  • The optimal radial and polynomial models both have a lower training and higher test error rate than the linear SVM. This suggests that both models are over fitting the training set when compared to the linear SVM.

  • The linear SVM with optimal cost has a test error rate that is slightly above its training error rate. The small increase is normal behaviour, and the fact that it is still below the radial and polynomial error rates strongly supports the linear SVM being the best model.