We can seen that we can fit an SVM with a non linear kernel inorder to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non linear decision boundary by performing logistic regression using non-linear transformations of the feature
Generate a data set with n=500 and p=2, such that the observation belong to two classes with the quadratic decision boundary between them.For instance, you can do this as follows.
x1=runif(500)-0.5
x2=runif(500)-0.5
y=1*(x1^2-x2^2>0)
Plot the observations,colored according to their class lables.Your plot should display X1 on the x-axis and x2 on the y axis
test.data=data.frame(x1=x1,x2=x2,y=as.factor(y))
plot(x1,x2,col=(2-y))
## Question 5c
Fit a logistic regression model to the data, using x1 and x2 as predictors
test.data.glm.fit=glm(y~x1+x2,data = test.data,family = "binomial")
Apply this model to the training data in order to obtain a predicted class label for each traing observation. plot the obeservations, colored according to the predicted class labels. the decision boundary should be linear.
test.data.glm.probs = predict(test.data.glm.fit, newdata=test.data, type = 'response')
test.data.glm.preds = rep(0,500)
test.data.glm.preds[test.data.glm.probs>0.50] = 1
test.data.confm=table(preds=test.data.glm.preds, truth=test.data$y)
plot(x1,x2,col=2-test.data.glm.preds)
Calculate the error rate
(test.data.errorrate=(test.data.confm[1,2]+test.data.confm[2,1])/nrow(test.data))
## [1] 0.354
*** As per the plot the decision boundary is linear. At the same time error rate is high 43% for the dataset contains only 500 records***
Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors
test.data.glm.fit1= glm(y~I(x1^2)+I(x2^2),data=test.data,family="binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Apply this model to the training data in order to obtain a predicted class label for each training observations. plot the observations, colored according to the predicted class labels. the decision boundary should be non-linear.
test.data.glm1.probs = predict(test.data.glm.fit1, newdata=test.data, type = 'response')
test.data.glm1.preds = rep(0,500)
test.data.glm1.preds[test.data.glm1.probs>0.50] = 1
test.data.confm1=table(preds=test.data.glm1.preds, truth=test.data$y)
plot(x1,x2,col=2-test.data.glm1.preds)
(test.data.errorrate1=(test.data.confm1[1,2]+test.data.confm1[2,1])/nrow(test.data))
## [1] 0
Fit a support vector classifier to the data with X1 and X2 as predictors.obtain a predicted class label for each training observations. plot the observations, colored according to the predicted class labels.
test.data.svml.fit= svm(y~.,data = test.data,kernel="linear",cost=10,scale=FALSE)
test.data.svml.pred = predict(test.data.svml.fit, newdata=test.data, type = 'response')
plot(x1,x2,col=test.data.svml.pred)
test.data.svml.confm=table(preds=test.data.svml.pred, truth=test.data$y)
(test.data.svml.errorrate=(test.data.svml.confm[1,2]+test.data.svml.confm[2,1])/nrow(test.data))
## [1] 0.474
Fit a SVM using non linear kernel to the data with X1 and X2 as predictors.obtain a predicted class label for each training observations. plot the obeservations, colored according to the predicted class labels.
test.data.svmr.fit= svm(y~.,data = test.data,kernel="radial",gamma=1,cost=10,scale=FALSE)
test.data.svmr.pred = predict(test.data.svmr.fit, newdata=test.data, type = 'response')
plot(x1,x2,col=test.data.svmr.pred)
test.data.svmr.confm=table(preds=test.data.svmr.pred, truth=test.data$y)
(test.data.svmr.errorrate=(test.data.svmr.confm[1,2]+test.data.svmr.confm[2,1])/nrow(test.data))
## [1] 0.022
Comment on your results
In this problem you will use support vector approaches inorder to predict whether a given car gets high or low mileage based on the auto data set.
data(Auto)
Create a binary varaible that take on a 1 for a cars with gas mileage above the mdeian and a 0 for cars with the gas mileage below the median.
set.seed(222)
auto.length = length(Auto$mpg)
auto.mpg.median = median(Auto$mpg)
Auto.mpg01=rep(0,auto.length)
for (i in 1:auto.length) if (Auto$mpg[i] > auto.mpg.median) Auto.mpg01[i]=1 else Auto.mpg01[i]=0
auto.new= data.frame(Auto,Auto.mpg01)
auto.new$Auto.mpg01=as.factor(auto.new$Auto.mpg01)
str(auto.new)
## 'data.frame': 392 obs. of 10 variables:
## $ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : num 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : num 130 165 150 150 140 198 220 215 225 190 ...
## $ weight : num 3504 3693 3436 3433 3449 ...
## $ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ year : num 70 70 70 70 70 70 70 70 70 70 ...
## $ origin : num 1 1 1 1 1 1 1 1 1 1 ...
## $ name : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
## $ Auto.mpg01 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
fit a support vector classifier to the data with various values of cost, in order to predict whether car gets high or low gas mileage
autolinear.tune=tune(svm,Auto.mpg01~.,data=auto.new,kernel='linear',
ranges=list(cost=c(0.001,0.01,0.1,1,5,10,100,1000)))
summary(autolinear.tune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.01275641
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.09955128 0.04760888
## 2 1e-02 0.07666667 0.04375200
## 3 1e-01 0.04596154 0.02359743
## 4 1e+00 0.01275641 0.01808165
## 5 5e+00 0.01532051 0.01318724
## 6 1e+01 0.01788462 0.01234314
## 7 1e+02 0.03057692 0.01606420
## 8 1e+03 0.03057692 0.01606420
As per the summary the best performance occurs at the cost =1.0, after that as the cost increases the error also started to decreases
Now repeat(b), this time using svm with radial and polinomial basis kernel with differenr values of gamma and degree of cost.
set.seed(222)
# Radial kernel with various values of gamma and cost.
auto.radial.tune=tune(svm, Auto.mpg01~., data=auto.new, kernel='radial',
ranges=list(cost=c(0.1,1,10,100,1000),gamma=c(0.5,1,2,3,4)))
auto.radial.tune$best.parameters
## cost gamma
## 2 1 0.5
auto.radial.tune$best.performance
## [1] 0.04602564
The training CV error is lowest for a radial model with cost=1 and gamma=0.5, but the value is around 4x higher than for the linear model.
# Polynomial kernel with various degrees
set.seed(222)
auto.poly.tune = tune(svm, Auto.mpg01~., data=auto.new, kernel='polynomial',
ranges=list(cost=c(0.1,1,10,100,1000), degree=c(1,2,3,4,5)))
auto.poly.tune$best.parameters
## cost degree
## 5 1000 1
auto.poly.tune$best.performance
## [1] 0.01019231
The best polynomial model is with degree=1 and cost=1000.The lowest training CV errors are given by the linear SVM and polynomial with degree=1, and this suggest the true decision boundary is linear.Need to test these models on a test set to properly ascertain which of the models is the best.
This problem involves the OJ dataset which is part of the ISLR package
Create a training set containing a random sample of 800 observations and a test set containing the remaining observations
set.seed(131)
# Training and test sets.
sample.data = sample.split(OJ$Purchase,SplitRatio = 0.8)
train.set = subset(OJ, sample.data==T)
test.set = subset(OJ, sample.data==F)
Fit a support vector classifer to the training data using cost=0.01, with purchase as the response and the other varaiables as predictors.use the summary function to produce the statistics and describe the results obtained.
svmfit = svm(Purchase~., data = train.set, kernel = "linear", cost=0.01)
summary(svmfit)
##
## Call:
## svm(formula = Purchase ~ ., data = train.set, kernel = "linear",
## cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 461
##
## ( 230 231 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
Based on the summary detail, the number of supporting vector is 461 which makes the maximized margin distance
what are the training and test error rates
svm.pred = predict(svmfit, train.set)
linear.train.confm=table(predict=svm.pred, truth=train.set$Purchase)
Test error rate
(linear.train.errorrate=(linear.train.confm[1,2]+linear.train.confm[2,1])/length(train.set))
## [1] 7.833333
svm.pred.test = predict(svmfit, test.set)
linear.test.confm=table(predict=svm.pred.test, truth=test.set$Purchase)
Test error rate
(linear.test.errorrate=(linear.test.confm[1,2]+linear.test.confm[2,1])/length(test.set))
## [1] 2.111111
use the tune() function to select the optimal cost. consider values inthe range 0.01 to 10
set.seed(131)
tune.out = tune(svm, Purchase~., data = train.set, kernel = "linear",
ranges=list(cost=c(0.01,0.1,0.5,1,10)))
tune.out$best.parameters
## cost
## 1 0.01
tune.out$best.model
##
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = train.set,
## ranges = list(cost = c(0.01, 0.1, 0.5, 1, 10)), kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 461
svm.pred.bm = predict(tune.out$best.model, train.set)
linear.train.confm.bm=table(predict=svm.pred.bm, truth=train.set$Purchase)
Test error rate
(linear.train.errorrate.bm=(linear.train.confm.bm[1,2]+linear.train.confm.bm[2,1])/length(train.set))
## [1] 7.833333
svm.pred.test.bm = predict(tune.out$best.model, test.set)
linear.test.confm.bm=table(predict=svm.pred.test.bm, truth=test.set$Purchase)
Test error rate
(linear.test.errorrate.bm=(linear.test.confm.bm[1,2]+linear.test.confm.bm[2,1])/length(test.set))
## [1] 2.111111
Repeat parts(b) through(c) using a support vector machine with the radial kernel use the default value for gamma
set.seed(131)
radial.tune.out = tune(svm, Purchase~., data = train.set, kernel = "radial",
ranges=list(cost=c(0.01,0.1,0.5,1,10)))
radial.tune.out$best.parameters
## cost
## 4 1
radial.tune.out$best.model
##
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = train.set,
## ranges = list(cost = c(0.01, 0.1, 0.5, 1, 10)), kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 388
svm.radial.pred.bm = predict(radial.tune.out$best.model, train.set)
radial.train.confm.bm=table(predict=svm.radial.pred.bm, truth=train.set$Purchase)
Test error rate
(radial.train.errorrate.bm=(radial.train.confm.bm[1,2]+radial.train.confm.bm[2,1])/length(train.set))
## [1] 7
svm.radial.pred.test.bm = predict(radial.tune.out$best.model, test.set)
radial.test.confm.bm=table(predict=svm.radial.pred.test.bm, truth=test.set$Purchase)
Test error rate
(radial.test.errorrate.bm=(radial.test.confm.bm[1,2]+radial.test.confm.bm[2,1])/length(test.set))
## [1] 2.166667
Repeat parts(b) through(c) using a support vector machine with the poly kernel use the default value for gamma
set.seed(131)
poly.tune.out = tune(svm, Purchase~., data = train.set, kernel = "poly",
ranges=list(cost=c(0.01,0.1,0.5,1,10)),degree=2)
poly.tune.out$best.parameters
## cost
## 5 10
poly.tune.out$best.model
##
## Call:
## best.tune(method = svm, train.x = Purchase ~ ., data = train.set,
## ranges = list(cost = c(0.01, 0.1, 0.5, 1, 10)), kernel = "poly",
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 10
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 357
svm.poly.pred.bm = predict(poly.tune.out$best.model, train.set)
poly.train.confm.bm=table(predict=svm.poly.pred.bm, truth=train.set$Purchase)
Test error rate
(poly.train.errorrate.bm=(poly.train.confm.bm[1,2]+poly.train.confm.bm[2,1])/length(train.set))
## [1] 7
svm.poly.pred.test.bm = predict(poly.tune.out$best.model, test.set)
poly.test.confm.bm=table(predict=svm.poly.pred.test.bm, truth=test.set$Purchase)
Test error rate
(poly.test.errorrate.bm=(poly.test.confm.bm[1,2]+poly.test.confm.bm[2,1])/length(test.set))
## [1] 2.277778
Overall which approach seems to give the best results on the data
The optimal radial and polynomial models both have a lower training and higher test error rate than the linear SVM. This suggests that both models are over fitting the training set when compared to the linear SVM.
The linear SVM with optimal cost has a test error rate that is slightly above its training error rate. The small increase is normal behaviour, and the fact that it is still below the radial and polynomial error rates strongly supports the linear SVM being the best model.