We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
set.seed(1)
x1 = runif(500) - 0.5
x2 = runif(500) - 0.5
y = 1*(x1^2 - x2^2 > 0)
plot(x1, x2, col = y+2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)
title("Plotting the data simulation")
fit.logit<-glm(y~x1+x2, family=binomial)
summary(fit.logit)
##
## Call:
## glm(formula = y ~ x1 + x2, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.179 -1.139 -1.112 1.206 1.257
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.087260 0.089579 -0.974 0.330
## x1 0.196199 0.316864 0.619 0.536
## x2 -0.002854 0.305712 -0.009 0.993
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.18 on 499 degrees of freedom
## Residual deviance: 691.79 on 497 degrees of freedom
## AIC: 697.79
##
## Number of Fisher Scoring iterations: 3
logit.probs<-predict(fit.logit, type="response")
logit.pred<-rep(0, length(logit.probs))
logit.pred[logit.probs> 0.50] = 1
table(logit.pred, y)
## y
## logit.pred 0 1
## 0 258 212
## 1 3 27
mean(logit.pred != y)
## [1] 0.43
plot(x1, x2, col = logit.pred + 2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)
The decision boundary is indeed linear.
fit.logit2 = glm(y~ poly(x1, 2) + poly(x2, 2) + I(x1*x2), family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
logit.probs2 = predict(fit.logit2, type = "response")
logit.pred2 = rep(0, length(logit.probs2))
logit.pred2[logit.probs2 > 0.50] = 1
table(logit.pred2, y)
## y
## logit.pred2 0 1
## 0 261 0
## 1 0 239
mean(logit.pred2 != y)
## [1] 0
The data now fits perfectly with a training error rate of 0%
plot(x1, x2, col = logit.pred2 + 2, xlab = expression(X[1]), ylab = expression(X[2]), pch = 16)
The decision boundary is now non-linear.
library(e1071)
dat = data.frame(x2 = x2, x1 = x1, y = as.factor(y))
svmfit.linear = svm(y~., data = dat, kernel = "linear")
plot(svmfit.linear, data = dat)
svmfit.rad = svm(y~., data = data.frame(x1 = x1, x2 = x2, y = as.factor(y)), kernel = "radial" ,gamma = 1, cost = 1)
plot(svmfit.rad, data = dat)
This problem involves the OJ data set which is part of the ISLR package.
library(ISLR)
library(e1071)
attach(OJ)
set.seed(1)
train = sample(1:nrow(OJ), 800)
train.Y = Purchase[train]
test.Y = Purchase[-train]
test.X = OJ[-train, ]
svmfit = svm(Purchase~., data = OJ[train, ], kernel = "linear", cost = 0.01)
summary(svmfit)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ[train, ], kernel = "linear",
## cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
## gamma: 0.05555556
##
## Number of Support Vectors: 432
##
## ( 215 217 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train.pred<-predict(svmfit)
mean(train.pred != train.Y)
## [1] 0.16625
test.pred = predict(svmfit, newdata = test.X)
mean(test.pred != test.Y)
## [1] 0.1814815
train error = 0.16625 test error = 0.1814815
set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "linear", ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.15625
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.16250 0.04124790
## 2 0.03 0.15750 0.04533824
## 3 0.10 0.16000 0.04401704
## 4 0.30 0.16000 0.04158325
## 5 1.00 0.15625 0.04611655
## 6 3.00 0.16125 0.04619178
## 7 10.00 0.16625 0.05036326
train_pred.linear = predict(tune.out$best.model)
mean(train_pred.linear != train.Y)
## [1] 0.15875
test_pred.linear = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.linear != test.Y)
## [1] 0.1925926
train error = 0.15875 test error = 0.1925926
set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "radial", ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.3
##
## - best performance: 0.17
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.38250 0.04048319
## 2 0.03 0.38000 0.04417453
## 3 0.10 0.17875 0.05036326
## 4 0.30 0.17000 0.04721405
## 5 1.00 0.17125 0.04210189
## 6 3.00 0.17500 0.04208127
## 7 10.00 0.17250 0.03670453
train_pred.rad = predict(tune.out$best.model)
mean(train_pred.rad != train.Y)
## [1] 0.155
test_pred.rad = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.rad != test.Y)
## [1] 0.162963
train error rate = 0.155 test error rate = 0.162963
set.seed(3)
tune.out = tune(svm, Purchase~., data = OJ[train, ], kernel = "polynomial", degree = 2, ranges = list(cost = c(0.01, 0.03, 0.1, 0.3, 1, 3, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.1775
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.38250 0.04048319
## 2 0.03 0.36250 0.04639804
## 3 0.10 0.32625 0.05787019
## 4 0.30 0.22625 0.03508422
## 5 1.00 0.19375 0.04573854
## 6 3.00 0.18875 0.03884174
## 7 10.00 0.17750 0.02993047
train_pred.poly = predict(tune.out$best.model)
mean(train_pred.poly != train.Y)
## [1] 0.145
test_pred.poly = predict(tune.out$best.model, newdata = test.X)
mean(test_pred.poly != test.Y)
## [1] 0.1851852
train error = 0.145 test error = 0.1851852