Chapter 09 (page 368): 5, 7, 8
We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
set.seed(12)
x1 = runif(500)-0.5
x2 = runif(500)-0.5
y = 1 * (x1^2 - x2^2 > 0)
plot(x1[y==0], x2[y==0], col="#FF6666", xlab="X1", ylab="X2", pch=18)
points(x1[y==1], x2[y==1], col="#66CCFF", pch=16)
dat = data.frame(x1 = x1, x2 = x2, y = as.factor(y))
glm.fit = glm(y~., data = dat, family = "binomial")
summary(glm.fit)
##
## Call:
## glm(formula = y ~ ., family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.350 -1.165 1.050 1.151 1.291
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.04927 0.08978 0.549 0.583
## x1 -0.23002 0.31534 -0.729 0.466
## x2 0.51072 0.31560 1.618 0.106
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.86 on 499 degrees of freedom
## Residual deviance: 689.58 on 497 degrees of freedom
## AIC: 695.58
##
## Number of Fisher Scoring iterations: 3
glm.prob = predict(glm.fit, newdata = dat, type = "response")
glm.pred = ifelse(glm.prob >= 0.5, 1, 0)
data.positive = dat[glm.pred == 1,]
data.negative = dat[glm.pred == 0,]
plot(data.positive$x1, data.positive$x2, col="#FF6666", xlab="X1", ylab="X2", pch=18)
points(data.negative$x1, data.negative$x2, col="#66CCFF", pch=16)
The logistic model that I ran for this problem is - $ y = x^2_{1} + log(x_2) + (x1*x2) $
With the model the 2nd Degree Polynomial of \(x_1\) and log of \(x_2\) are significant variables to estimating y.
glm.fit2 = glm(y~poly(x1,2) + log(x2) + I(x1*x2), data=dat, family = "binomial")
## Warning in log(x2): NaNs produced
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(glm.fit2)
##
## Call:
## glm(formula = y ~ poly(x1, 2) + log(x2) + I(x1 * x2), family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.02566 -0.12847 0.00022 0.14337 1.60700
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.817 1.504 -5.864 4.53e-09 ***
## poly(x1, 2)1 -10.434 20.032 -0.521 0.602
## poly(x1, 2)2 90.385 14.996 6.027 1.67e-09 ***
## log(x2) -6.237 1.025 -6.085 1.16e-09 ***
## I(x1 * x2) 7.625 9.572 0.797 0.426
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 341.847 on 247 degrees of freedom
## Residual deviance: 90.027 on 243 degrees of freedom
## (252 observations deleted due to missingness)
## AIC: 100.03
##
## Number of Fisher Scoring iterations: 8
The decision boundary between this plot and the one above are completely different. The plot presented below cannot be split with a linear decision boundary.
glm.probs2 = predict(glm.fit2, newdata = dat, type = "response")
## Warning in log(x2): NaNs produced
glm.pred2 = ifelse(glm.probs2 >= 0.5, 1, 0)
data.positive2 = dat[glm.pred2 == 1,]
data.negative2 = dat[glm.pred2 == 0,]
plot(data.positive2$x1, data.positive2$x2, col="#FF6666", xlab="X1", ylab="X2", pch=18)
points(data.negative2$x1, data.negative2$x2, col="#66CCFF", pch=16)
The Support Vector Classifier almost perfectly predictions the classes of the observations, but has blurriness towards the middle of the plot.
library(e1071)
svm.fit = svm(as.factor(y)~ x1 + x2, data = dat, kernal = "linear", cost = 0.1)
svm.pred = predict(svm.fit, dat)
svm.positive = dat[svm.pred == 1,]
svm.negative = dat[svm.pred == 0,]
plot(svm.positive$x1, svm.positive$x2, col="#FF6666", xlab="X1", ylab="X2", pch=18)
points(svm.negative$x1, svm.negative$x2, col="#66CCFF", pch=16)
The Non-Linear Kernal SVM does a much better at prediction for this random dataset.
library(e1071)
svm.fit2=svm(as.factor(y)~x1+x2, dat, kernel="radial", gamma=1, cost=1)
svm.pred2=predict(svm.fit2, dat)
svm.positive2= dat[svm.pred2==1,]
svm.negative2= dat[svm.pred2==0,]
plot(svm.positive2$x1, svm.positive2$x2, col="#FF6666", xlab="X1", ylab="X2", pch=18)
points(svm.negative2$x1, svm.negative2$x2, col="#66CCFF", pch=16)
The most poweerful model utilized was the Non-Linear SVM. With the Non-Linear SVM, there was less confusion between the classes. The Linear and Non-Linear Logistic Regressions very poorly predicted the dataset.
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
library(ISLR)
data(Auto)
summary(Auto)
## mpg cylinders displacement horsepower weight
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
##
## acceleration year origin name
## Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
## 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
## Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
## Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
## 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
## Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
## (Other) :365
gas.median = median(Auto$mpg)
gas.class = ifelse(Auto$mpg > gas.median, 1, 0)
Auto$mpglevel = as.factor(gas.class)
str(Auto$mpglevel)
## Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
For this problem, the C Value of 100 is selected by cross-validation. With C = 100, the error rate is minimized at 0.01262821.
set.seed(123)
set.seed(10)
tune.out=tune(svm, mpglevel~., data=Auto, kernal="linear", ranges=list(cost=c(0.001, 0.01, 0.1, 1,5,10,100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 100
##
## - best performance: 0.01262821
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.55365385 0.03306661
## 2 1e-02 0.55365385 0.03306661
## 3 1e-01 0.09942308 0.04714670
## 4 1e+00 0.07897436 0.03260883
## 5 5e+00 0.06878205 0.03175943
## 6 1e+01 0.05352564 0.03055668
## 7 1e+02 0.01262821 0.02437031
tune.out$best.parameters
## cost
## 7 100
Per cross-validation, R has selected a Radial SVM model with C = 100 and 63 Support Vectors. Of the 63 Support Vectors, 30 of them are classified as “low MPG” or 0 while 33 Support Vectors are classified as “high MPG” or 1.
best.svmLinear = tune.out$best.model
summary(best.svmLinear)
##
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto, ranges = list(cost = c(0.001,
## 0.01, 0.1, 1, 5, 10, 100)), kernal = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 100
##
## Number of Support Vectors: 63
##
## ( 30 33 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
For the Radial Kernal, the model selected has a Cost of 100 and Gamma of 0.01. There are 57 Support Vectors in which 27 are classified as “Low MPG” or 0 and the rest of the 30 observations are classified as “High MPG” or 1. The error for this model is 0.01025641.
set.seed(123)
tune.out.rad = tune(svm, mpglevel~., data=Auto, kernal="radial", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10 ,100), gamma=c(0.001, 0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out.rad)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 100 0.01
##
## - best performance: 0.01025641
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-03 1e-03 0.58173077 0.04740051
## 2 1e-02 1e-03 0.58173077 0.04740051
## 3 1e-01 1e-03 0.56891026 0.06627739
## 4 1e+00 1e-03 0.09173077 0.03990003
## 5 5e+00 1e-03 0.07634615 0.03928191
## 6 1e+01 1e-03 0.07121795 0.04410874
## 7 1e+02 1e-03 0.02288462 0.01427008
## 8 1e-03 1e-02 0.58173077 0.04740051
## 9 1e-02 1e-02 0.58173077 0.04740051
## 10 1e-01 1e-02 0.08916667 0.04345384
## 11 1e+00 1e-02 0.07378205 0.04185248
## 12 5e+00 1e-02 0.04589744 0.03136327
## 13 1e+01 1e-02 0.02032051 0.02305327
## 14 1e+02 1e-02 0.01025641 0.01792836
## 15 1e-03 1e-01 0.58173077 0.04740051
## 16 1e-02 1e-01 0.21391026 0.09431095
## 17 1e-01 1e-01 0.07634615 0.03928191
## 18 1e+00 1e-01 0.05852564 0.03960325
## 19 5e+00 1e-01 0.03057692 0.02611396
## 20 1e+01 1e-01 0.03314103 0.02942215
## 21 1e+02 1e-01 0.03326923 0.02434857
## 22 1e-03 1e+00 0.58173077 0.04740051
## 23 1e-02 1e+00 0.58173077 0.04740051
## 24 1e-01 1e+00 0.58173077 0.04740051
## 25 1e+00 1e+00 0.05865385 0.04942437
## 26 5e+00 1e+00 0.05608974 0.04595880
## 27 1e+01 1e+00 0.05608974 0.04595880
## 28 1e+02 1e+00 0.05608974 0.04595880
## 29 1e-03 5e+00 0.58173077 0.04740051
## 30 1e-02 5e+00 0.58173077 0.04740051
## 31 1e-01 5e+00 0.58173077 0.04740051
## 32 1e+00 5e+00 0.51544872 0.06790600
## 33 5e+00 5e+00 0.51544872 0.06790600
## 34 1e+01 5e+00 0.51544872 0.06790600
## 35 1e+02 5e+00 0.51544872 0.06790600
## 36 1e-03 1e+01 0.58173077 0.04740051
## 37 1e-02 1e+01 0.58173077 0.04740051
## 38 1e-01 1e+01 0.58173077 0.04740051
## 39 1e+00 1e+01 0.54602564 0.06355090
## 40 5e+00 1e+01 0.54102564 0.06959451
## 41 1e+01 1e+01 0.54102564 0.06959451
## 42 1e+02 1e+01 0.54102564 0.06959451
## 43 1e-03 1e+02 0.58173077 0.04740051
## 44 1e-02 1e+02 0.58173077 0.04740051
## 45 1e-01 1e+02 0.58173077 0.04740051
## 46 1e+00 1e+02 0.58173077 0.04740051
## 47 5e+00 1e+02 0.58173077 0.04740051
## 48 1e+01 1e+02 0.58173077 0.04740051
## 49 1e+02 1e+02 0.58173077 0.04740051
tune.out.rad$best.performance
## [1] 0.01025641
best.rad.model = tune.out.rad$best.model
summary(best.rad.model)
##
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto, ranges = list(cost = c(0.001,
## 0.01, 0.1, 1, 5, 10, 100), gamma = c(0.001, 0.01, 0.1, 1, 5,
## 10, 100)), kernal = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 100
##
## Number of Support Vectors: 57
##
## ( 27 30 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
For the Polynomial Kernal, the model chosen has a Cost of 100 and Polynomial Degree of 2. This model utilizes 63 Support Vectors where 30 of the them are classified as “Low MPG” and the remaining 33 Support Vectors are classified as “High MPG”. The error for this model is 0.01282051.
set.seed(123)
tune.out.poly = tune(svm, mpglevel~., data=Auto, kernal="polynomial", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10 ,100), degree=c(2,3,4,5)))
summary(tune.out.poly)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 100 2
##
## - best performance: 0.01282051
##
## - Detailed performance results:
## cost degree error dispersion
## 1 1e-03 2 0.58173077 0.04740051
## 2 1e-02 2 0.58173077 0.04740051
## 3 1e-01 2 0.10692308 0.05900981
## 4 1e+00 2 0.07891026 0.03828837
## 5 5e+00 2 0.06608974 0.04785032
## 6 1e+01 2 0.05602564 0.03551922
## 7 1e+02 2 0.01282051 0.01813094
## 8 1e-03 3 0.58173077 0.04740051
## 9 1e-02 3 0.58173077 0.04740051
## 10 1e-01 3 0.10692308 0.05900981
## 11 1e+00 3 0.07891026 0.03828837
## 12 5e+00 3 0.06608974 0.04785032
## 13 1e+01 3 0.05602564 0.03551922
## 14 1e+02 3 0.01282051 0.01813094
## 15 1e-03 4 0.58173077 0.04740051
## 16 1e-02 4 0.58173077 0.04740051
## 17 1e-01 4 0.10692308 0.05900981
## 18 1e+00 4 0.07891026 0.03828837
## 19 5e+00 4 0.06608974 0.04785032
## 20 1e+01 4 0.05602564 0.03551922
## 21 1e+02 4 0.01282051 0.01813094
## 22 1e-03 5 0.58173077 0.04740051
## 23 1e-02 5 0.58173077 0.04740051
## 24 1e-01 5 0.10692308 0.05900981
## 25 1e+00 5 0.07891026 0.03828837
## 26 5e+00 5 0.06608974 0.04785032
## 27 1e+01 5 0.05602564 0.03551922
## 28 1e+02 5 0.01282051 0.01813094
tune.out.poly$best.performance
## [1] 0.01282051
best.poly.model = tune.out.poly$best.model
summary(best.poly.model)
##
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto, ranges = list(cost = c(0.001,
## 0.01, 0.1, 1, 5, 10, 100), degree = c(2, 3, 4, 5)), kernal = "polynomial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 100
##
## Number of Support Vectors: 63
##
## ( 30 33 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing
plot (svmfit, data)
where svmfit contains your fitted model and dat is a data frame containing your data, you can type
plot(svmfit, dat, x1~x4)
in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.
Linear Support Vector Classifier Plots
svm.linear=svm(mpglevel~., data=Auto, kernal="linear", cost=100)
svm.rad=svm(mpglevel~., data=Auto, kernal="radial", cost=100, gamma=0.01)
svm.poly=svm(mpglevel~., data=Auto, kernal="polynomial", cost=100, degree=2)
plotpairs = function(autofit) {
for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
plot(autofit, Auto, as.formula(paste("mpg~", name, sep = "")))
}
}
plotpairs(svm.linear)
Radial Support Vector Classifiers
plotpairs(svm.rad)
Polynomial Support Vector Classifier
plotpairs(svm.poly)
This problem involves the OJ data set which is part of the ISLR package.
library(ISLR)
data(OJ)
summary(OJ)
## Purchase WeekofPurchase StoreID PriceCH PriceMM
## CH:653 Min. :227.0 Min. :1.00 Min. :1.690 Min. :1.690
## MM:417 1st Qu.:240.0 1st Qu.:2.00 1st Qu.:1.790 1st Qu.:1.990
## Median :257.0 Median :3.00 Median :1.860 Median :2.090
## Mean :254.4 Mean :3.96 Mean :1.867 Mean :2.085
## 3rd Qu.:268.0 3rd Qu.:7.00 3rd Qu.:1.990 3rd Qu.:2.180
## Max. :278.0 Max. :7.00 Max. :2.090 Max. :2.290
## DiscCH DiscMM SpecialCH SpecialMM
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.05186 Mean :0.1234 Mean :0.1477 Mean :0.1617
## 3rd Qu.:0.00000 3rd Qu.:0.2300 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :0.50000 Max. :0.8000 Max. :1.0000 Max. :1.0000
## LoyalCH SalePriceMM SalePriceCH PriceDiff Store7
## Min. :0.000011 Min. :1.190 Min. :1.390 Min. :-0.6700 No :714
## 1st Qu.:0.325257 1st Qu.:1.690 1st Qu.:1.750 1st Qu.: 0.0000 Yes:356
## Median :0.600000 Median :2.090 Median :1.860 Median : 0.2300
## Mean :0.565782 Mean :1.962 Mean :1.816 Mean : 0.1465
## 3rd Qu.:0.850873 3rd Qu.:2.130 3rd Qu.:1.890 3rd Qu.: 0.3200
## Max. :0.999947 Max. :2.290 Max. :2.090 Max. : 0.6400
## PctDiscMM PctDiscCH ListPriceDiff STORE
## Min. :0.0000 Min. :0.00000 Min. :0.000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.140 1st Qu.:0.000
## Median :0.0000 Median :0.00000 Median :0.240 Median :2.000
## Mean :0.0593 Mean :0.02731 Mean :0.218 Mean :1.631
## 3rd Qu.:0.1127 3rd Qu.:0.00000 3rd Qu.:0.300 3rd Qu.:3.000
## Max. :0.4020 Max. :0.25269 Max. :0.440 Max. :4.000
str(OJ)
## 'data.frame': 1070 obs. of 18 variables:
## $ Purchase : Factor w/ 2 levels "CH","MM": 1 1 1 2 1 1 1 1 1 1 ...
## $ WeekofPurchase: num 237 239 245 227 228 230 232 234 235 238 ...
## $ StoreID : num 1 1 1 1 7 7 7 7 7 7 ...
## $ PriceCH : num 1.75 1.75 1.86 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
## $ PriceMM : num 1.99 1.99 2.09 1.69 1.69 1.99 1.99 1.99 1.99 1.99 ...
## $ DiscCH : num 0 0 0.17 0 0 0 0 0 0 0 ...
## $ DiscMM : num 0 0.3 0 0 0 0 0.4 0.4 0.4 0.4 ...
## $ SpecialCH : num 0 0 0 0 0 0 1 1 0 0 ...
## $ SpecialMM : num 0 1 0 0 0 1 1 0 0 0 ...
## $ LoyalCH : num 0.5 0.6 0.68 0.4 0.957 ...
## $ SalePriceMM : num 1.99 1.69 2.09 1.69 1.69 1.99 1.59 1.59 1.59 1.59 ...
## $ SalePriceCH : num 1.75 1.75 1.69 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
## $ PriceDiff : num 0.24 -0.06 0.4 0 0 0.3 -0.1 -0.16 -0.16 -0.16 ...
## $ Store7 : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 2 2 2 2 2 ...
## $ PctDiscMM : num 0 0.151 0 0 0 ...
## $ PctDiscCH : num 0 0 0.0914 0 0 ...
## $ ListPriceDiff : num 0.24 0.24 0.23 0 0 0.3 0.3 0.24 0.24 0.24 ...
## $ STORE : num 1 1 1 1 0 0 0 0 0 0 ...
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
set.seed(1234)
oj.intrain <- createDataPartition(OJ$Purchase, p = 0.746, list = FALSE)
oj.train <- OJ[oj.intrain,]
oj.test <- OJ[-oj.intrain,]
dim(oj.train)
## [1] 800 18
The Support Vector Classifier with Cost = 0.01 results in 439 Support Vectors with 221 of the Support Vectors being classified as CH and 218 of the Support Vectors being classified as MM.
library(e1071)
oj.svm <- svm(Purchase~., data = oj.train, kernel = "linear", cost = 0.01)
summary(oj.svm)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 439
##
## ( 221 218 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
The Train Error Rate is 17.63%.
oj.train.pred <- predict(oj.svm, oj.train)
table(oj.train$Purchase, oj.train.pred)
## oj.train.pred
## CH MM
## CH 430 58
## MM 83 229
(83+58)/800
## [1] 0.17625
The Test Error Rate is 14.44%.
oj.test.pred <- predict(oj.svm, oj.test)
table(oj.test$Purchase, oj.test.pred)
## oj.test.pred
## CH MM
## CH 147 18
## MM 21 84
(21+18)/270
## [1] 0.1444444
The optimal Cost value is 0.1 with an error of 0.17625.
set.seed(1234)
oj.tune.out = tune(svm, Purchase ~., data = oj.train, kernel = "linear", ranges = list(cost=c(0.001, 0.01, 0.1, 1, 5, 10)))
summary(oj.tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.1
##
## - best performance: 0.17625
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.33750 0.07264832
## 2 1e-02 0.18000 0.04937104
## 3 1e-01 0.17625 0.05816941
## 4 1e+00 0.18000 0.05986095
## 5 5e+00 0.17625 0.05478810
## 6 1e+01 0.17750 0.05583955
oj.tune.out$best.parameters
## cost
## 3 0.1
With the Cost Value = 0.1, the Training Error Rate is 16.88%.
oj.new.svm = svm(Purchase ~., kernel = "linear", data = oj.train, cost = oj.tune.out$best.parameters$cost)
oj.new.train.pred = predict(oj.new.svm, oj.train)
table(oj.train$Purchase, oj.new.train.pred)
## oj.new.train.pred
## CH MM
## CH 428 60
## MM 75 237
(75 + 60)/800
## [1] 0.16875
The Test Error Rate with Cost = 0.1 is 14.07%.
oj.new.test.pred = predict(oj.new.svm, oj.test)
table(oj.test$Purchase, oj.new.test.pred)
## oj.new.test.pred
## CH MM
## CH 147 18
## MM 20 85
(20 + 18)/270
## [1] 0.1407407
With a Radial SVM where Cost = 0.01, 625 Support Vectors are used. Of the 625 Support Vectors, 313 of them are classified as CH and the rest of the 312 are classified as MM.On the training dataset, this Radial SVM results in a training error rate of 39%.
oj.svm.rad = svm(Purchase~., data = oj.train, kernel = "radial", cost = 0.01)
summary(oj.svm.rad)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "radial", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.01
##
## Number of Support Vectors: 625
##
## ( 313 312 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
oj.rad.train.pred = predict(oj.svm.rad, oj.train)
table(oj.train$Purchase, oj.rad.train.pred)
## oj.rad.train.pred
## CH MM
## CH 488 0
## MM 312 0
312/800
## [1] 0.39
The Radial SVM with the Cost = 0.01, the Test Error Rate is 38.89%.
oj.rad.test.pred = predict(oj.svm.rad, oj.test)
table(oj.test$Purchase, oj.rad.test.pred)
## oj.rad.test.pred
## CH MM
## CH 165 0
## MM 105 0
105/270
## [1] 0.3888889
After tuning the Radial SVM, the best Cost Value is 1 where the error = 0.17750.
set.seed(1234)
oj.rad.tune = tune(svm, Purchase~., data = oj.train, kernal = "radial", ranges = list(cost=c(0.001, 0.01, 0.1, 1, 5, 10)))
summary(oj.rad.tune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.1775
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.39000 0.04706674
## 2 1e-02 0.39000 0.04706674
## 3 1e-01 0.18750 0.03632416
## 4 1e+00 0.17750 0.04440971
## 5 5e+00 0.18250 0.03446012
## 6 1e+01 0.18875 0.03747684
The Radial SVM with Cost = 1 results in 375 Support Vectors being used with 189 of the Support Vectors being classified as CH and 186 being classified as MM. On the Training Dataset, the Error Rate is 15.25%.
oj.svm.rad2 = svm(Purchase~., data = oj.train, kernel = "radial", cost = oj.rad.tune$best.parameters$cost)
summary(oj.svm.rad2)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "radial", cost = oj.rad.tune$best.parameters$cost)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 375
##
## ( 189 186 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
oj.train.rad.pred2 = predict(oj.svm.rad2, oj.train)
table(oj.train$Purchase, oj.train.rad.pred2)
## oj.train.rad.pred2
## CH MM
## CH 445 43
## MM 79 233
(79 + 43)/800
## [1] 0.1525
On the Test Data, the Error Rate is 15.19% for the Radial SVM with Cost = 1.
oj.test.rad.pred2 = predict(oj.svm.rad2, oj.test)
table(oj.test$Purchase, oj.test.rad.pred2)
## oj.test.rad.pred2
## CH MM
## CH 150 15
## MM 26 79
(26+15)/270
## [1] 0.1518519
The Polynomial SVM with Cost = 0.01 and Degree = 2 results in 625 Support Vectors with 313 classified as CH and 312 as MM. The Training Error is 39%.
oj.svm.poly = svm(Purchase~., data = oj.train, kernel = "poly", cost = 0.01, degree = 2)
summary(oj.svm.poly)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "poly", cost = 0.01,
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.01
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 629
##
## ( 317 312 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
oj.poly.train.pred = predict(oj.svm.poly, oj.train)
table(oj.train$Purchase, oj.poly.train.pred)
## oj.poly.train.pred
## CH MM
## CH 488 0
## MM 312 0
312/800
## [1] 0.39
The Polynomial SVM with the Cost = 0.01 and Degree = 2, the Test Error Rate is 38.89%.
oj.poly.test.pred = predict(oj.svm.poly, oj.test)
table(oj.test$Purchase, oj.poly.test.pred)
## oj.poly.test.pred
## CH MM
## CH 165 0
## MM 105 0
105/270
## [1] 0.3888889
After tuning the Polynomial SVM, the best Cost value = 10 where the error = 0.18625.
set.seed(1234)
oj.poly.tune = tune(svm, Purchase~., data = oj.train, kernel = "poly", ranges = list(cost=c(0.001, 0.01, 0.1, 1, 5, 10)), degree = 2)
summary(oj.poly.tune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.18625
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.39000 0.04706674
## 2 1e-02 0.39000 0.04706674
## 3 1e-01 0.30875 0.05804991
## 4 1e+00 0.20500 0.05109903
## 5 5e+00 0.19125 0.03998698
## 6 1e+01 0.18625 0.04767147
oj.poly.tune$best.parameters
## cost
## 6 10
The Polynomial SVM with Cost = 10 results in 344 Support Vectors being used with 175 of the Support Vectors being classified as CH and 169 being classified as MM. On the Training Dataset, the Error Rate is 15%.
oj.svm.poly2 = svm(Purchase~., data = oj.train, kernel = "poly", cost = oj.poly.tune$best.parameters$cost, degree = 2)
summary(oj.svm.poly2)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "poly", cost = oj.poly.tune$best.parameters$cost,
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 10
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 344
##
## ( 175 169 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
oj.train.poly.pred2 = predict(oj.svm.poly2, oj.train)
table(oj.train$Purchase, oj.train.poly.pred2)
## oj.train.poly.pred2
## CH MM
## CH 447 41
## MM 79 233
(79+41)/800
## [1] 0.15
On the Test Data, the Error Rate is 17.78% for the Polynomial SVM with Cost = 10 and Degree = 2.
oj.test.poly.pred2 = predict(oj.svm.poly2, oj.test)
table(oj.test$Purchase, oj.test.poly.pred2)
## oj.test.poly.pred2
## CH MM
## CH 152 13
## MM 35 70
(35+13)/270
## [1] 0.1777778
The Tuned Radial SVM with Cost = 1 results in the lowest Test Error Rate for the OJ data set.