Assignment 8_Dhammani

PROBLEM 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

Generating a dataset

set.seed(1)
x1 = runif(500) - 0.5
x2 = runif(500) - 0.5
y = 1*(x1^2 - x2^2 > 0)

Plotting the observations

plot(x1, x2, col = y+2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)
title("Plotting the data simulation")

Logistic Regression Model

fit.logit<-glm(y~x1+x2, family=binomial)
summary(fit.logit)

## 
## Call:
## glm(formula = y ~ x1 + x2, family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.179  -1.139  -1.112   1.206   1.257  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.087260   0.089579  -0.974    0.330
## x1           0.196199   0.316864   0.619    0.536
## x2          -0.002854   0.305712  -0.009    0.993
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.18  on 499  degrees of freedom
## Residual deviance: 691.79  on 497  degrees of freedom
## AIC: 697.79
## 
## Number of Fisher Scoring iterations: 3

Applying the model to training data and plotting it.

logit.probs<-predict(fit.logit, type="response")
logit.pred<-rep(0, length(logit.probs))
logit.pred[logit.probs> 0.50] = 1
table(logit.pred, y)

##           y
## logit.pred   0   1
##          0 258 212
##          1   3  27

mean(logit.pred != y)

## [1] 0.43

plot(x1, x2, col = logit.pred + 2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)

The decision boundary is indeed linear.

Fitting the logistic regression model using non-linear functions of X1 and X2

fit.logit2 = glm(y~ poly(x1, 2) + poly(x2, 2) + I(x1*x2), family = binomial)

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

logit.probs2 = predict(fit.logit2, type = "response")
logit.pred2 = rep(0, length(logit.probs2))
logit.pred2[logit.probs2 > 0.50] = 1
table(logit.pred2, y)

##            y
## logit.pred2   0   1
##           0 261   0
##           1   0 239

mean(logit.pred2 != y)

## [1] 0

The data now fits perfectly with a training error rate of 0%

Applying the model to training model to obtain a predicted class label for each training observation.

plot(x1, x2, col = logit.pred2 + 2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)

The decision boundary is now non-linear.

Support Vector Classifier

library(e1071)
dat = data.frame(x2 = x2, x1 = x1, y = as.factor(y))
svmfit.linear = svm(y~., data = dat, kernel = "linear")
plot(svmfit.linear, data = dat)

SVM using non-linear kernel

svmfit.rad = svm(y~., data = data.frame(x1 = x1, x2 = x2, y = as.factor(y)), kernel = "radial" ,gamma = 1, cost = 1)
plot(svmfit.rad, data = dat)

The relationship between predictors and response is non-linear. Hence, SVM with non-linear kernel outperforms SVM with linear kernel in fitting the data.

PROBLEM 8

This problem involves the OJ data set which is part of the ISLR package.

Creating a training set

library(ISLR)
library(e1071)
attach(OJ)
set.seed(1)
train = sample(1:nrow(OJ), 800)
train.Y = Purchase[train]
test.Y = Purchase[-train]
test.X = OJ[-train, ]

Fitting a support vector classifier to the training data

svmfit = svm(Purchase~., data = OJ[train, ], kernel = "linear", cost = 0.01)
summary(svmfit)

## 
## Call:
## svm(formula = Purchase ~ ., data = OJ[train, ], kernel = "linear", 
##     cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  432
## 
##  ( 215 217 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Error Rates

train.pred<-predict(svmfit)
mean(train.pred != train.Y)

## [1] 0.16625

test.pred = predict(svmfit, newdata = test.X)
mean(test.pred != test.Y)

## [1] 0.1814815

train error = 0.16625 test error = 0.1814815

tune function to select an optimal cost.

set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "linear", ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.15625 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.16250 0.04124790
## 2  0.03 0.15750 0.04533824
## 3  0.10 0.16000 0.04401704
## 4  0.30 0.16000 0.04158325
## 5  1.00 0.15625 0.04611655
## 6  3.00 0.16125 0.04619178
## 7 10.00 0.16625 0.05036326

Error Rates

train_pred.linear = predict(tune.out$best.model)
mean(train_pred.linear != train.Y)

## [1] 0.15875

test_pred.linear = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.linear != test.Y)

## [1] 0.1925926

train error = 0.15875 test error = 0.1925926

SVM, Radial Kernel

set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "radial", ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.3
## 
## - best performance: 0.17 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.38250 0.04048319
## 2  0.03 0.38000 0.04417453
## 3  0.10 0.17875 0.05036326
## 4  0.30 0.17000 0.04721405
## 5  1.00 0.17125 0.04210189
## 6  3.00 0.17500 0.04208127
## 7 10.00 0.17250 0.03670453

train_pred.rad = predict(tune.out$best.model)
mean(train_pred.rad != train.Y)

## [1] 0.155

test_pred.rad = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.rad != test.Y)

## [1] 0.162963

train error rate = 0.155 test error rate = 0.162963

Polynomial Kernel, degrees =2

set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "polynomial", degree = 2, ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.1775 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.38250 0.04048319
## 2  0.03 0.36250 0.04639804
## 3  0.10 0.32625 0.05787019
## 4  0.30 0.22625 0.03508422
## 5  1.00 0.19375 0.04573854
## 6  3.00 0.18875 0.03884174
## 7 10.00 0.17750 0.02993047

train_pred.poly = predict(tune.out$best.model)
mean(train_pred.poly != train.Y)

## [1] 0.145

test_pred.poly = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.poly != test.Y)

## [1] 0.1851852

train error = 0.145 test error = 0.1851852

Radial SVM gives the best results.

Assignment 8_Dhammani_05-02-2019

Aashish Dhammani

5/3/2019

PROBLEM 5

PROBLEM 8