We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
x1=runif(500)-0.5
x2=runif(500)-.5
y=1*(x1^2-x2^2>0)
plot(x1[y==0], x2[y==0], col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(x1[y==1], x2[y==1], col="paleturquoise3", pch=16)
From the look of this plot, the decision boundary seems to be non-linear.
dat=data.frame(x1=x1,x2=x2, y=as.factor(y))
reg.fit=glm(y~., data=dat, family = "binomial")
summary(reg.fit)
##
## Call:
## glm(formula = y ~ ., family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.279 -1.189 1.078 1.143 1.254
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.03564 0.09011 0.396 0.692
## x1 -0.33740 0.31142 -1.083 0.279
## x2 0.11281 0.30999 0.364 0.716
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.95 on 499 degrees of freedom
## Residual deviance: 691.65 on 497 degrees of freedom
## AIC: 697.65
##
## Number of Fisher Scoring iterations: 3
When looking at the summary for the logistic regression, we can see that both variables are not significant for predicting y–both are randomized numbers, so this is expected.
reg.prob=predict(reg.fit, newdata=dat, type="response")
reg.pred=ifelse(reg.prob>=0.5, 1, 0)
data.positive=dat[reg.pred==1,]
data.negative=dat[reg.pred==0,]
plot(data.positive$x1, data.positive$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(data.negative$x1, data.negative$x2, col="paleturquoise3", pch=16)
As expected the decision boundary is starting to look a lot more linear, but minimizes one of the classes.
reg.fit2=glm(y~poly(x1,3) + poly(x1,3) + I(x1*x2), data=dat, family = "binomial")
summary(reg.fit2)
##
## Call:
## glm(formula = y ~ poly(x1, 3) + poly(x1, 3) + I(x1 * x2), family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.6396 -0.7200 0.2027 0.7320 1.9134
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2020 0.1160 1.742 0.0814 .
## poly(x1, 3)1 -2.4663 3.3272 -0.741 0.4585
## poly(x1, 3)2 36.6567 3.3184 11.047 <2e-16 ***
## poly(x1, 3)3 0.7005 3.3769 0.207 0.8357
## I(x1 * x2) 0.6423 1.4974 0.429 0.6679
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 692.95 on 499 degrees of freedom
## Residual deviance: 493.35 on 495 degrees of freedom
## AIC: 503.35
##
## Number of Fisher Scoring iterations: 5
For this non-linear logistic regression model, we get one variable, \(X_1^3\) to be significant.
reg.prob2=predict(reg.fit2, newdata=dat, type = "response")
reg.pred2=ifelse(reg.prob2>=0.6, 1, 0)
data.positive2=dat[reg.pred2==1,]
data.negative2=dat[reg.pred2==0,]
plot(data.positive2$x1, data.positive2$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(data.negative$x1, data.negative$x2, col="paleturquoise3", pch=16)
The non-linear decision boundary is completely different that the boundaries found before. This boundary splits the data almost in the horizontal middle. This boundary is some little similarities to the original data plot as far as distinguishing between classes.
library(e1071)
svm.fit=svm(as.factor(y)~x1+x2, dat, kernel="linear", cost=1)
svm.pred=predict(svm.fit, dat)
svm.positive= dat[svm.pred==1,]
svm.negative= dat[svm.pred==0,]
plot(svm.positive$x1, svm.positive$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(svm.negative$x1, svm.negative$x2, col="paleturquoise3", pch=16)
Similar to the linear logistic model, this linear kernel classifies nearly all points to a single class. There seems to be a somewhat linear decision boundary towards the bottom of the graph for this Support Vector Machine.
library(e1071)
svm.fit2=svm(as.factor(y)~x1+x2, dat, kernel="radial", gamma=1, cost=1)
svm.pred2=predict(svm.fit2, dat)
svm.positive2= dat[svm.pred2==1,]
svm.negative2= dat[svm.pred2==0,]
plot(svm.positive2$x1, svm.positive2$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(svm.negative2$x1, svm.negative2$x2, col="paleturquoise3", pch=16)
For the non-linear support vector machine, I used the radial kernal. The radial kernal provides more crisp decision boundaries for the data than the original, but shows the true decision boundary for the data. In comparison with the original decision, boundary, there is less confusion between the classes. You can almost put two diagnoal perpendicular lines and be able to accurately distinguish between the classes.
The polynomial logistic regression model with interaction terms and SVM were the most powerful at finding the non-linear decision boundaries. The basic logistic regression and linear kernal SVM performed poorly at coming out with accurate boundaries.
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
library(ISLR)
attach(Auto)
summary(Auto)
## mpg cylinders displacement horsepower
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0
##
## weight acceleration year origin
## Min. :1613 Min. : 8.00 Min. :70.00 Min. :1.000
## 1st Qu.:2225 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000
## Median :2804 Median :15.50 Median :76.00 Median :1.000
## Mean :2978 Mean :15.54 Mean :75.98 Mean :1.577
## 3rd Qu.:3615 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000
## Max. :5140 Max. :24.80 Max. :82.00 Max. :3.000
##
## name
## amc matador : 5
## ford pinto : 5
## toyota corolla : 5
## amc gremlin : 4
## amc hornet : 4
## chevrolet chevette: 4
## (Other) :365
gas.median=median(Auto$mpg)
gas.class=ifelse(Auto$mpg>gas.median, 1, 0)
Auto$mpglevel=as.factor(gas.class)
str(Auto$mpglevel)
## Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
set.seed(10)
tune.out=tune(svm, mpglevel~., data=Auto, kernal="linear", ranges=list(cost=c(0.001, 0.01, 0.1, 1,5,10,100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 100
##
## - best performance: 0.01269231
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.50602564 0.16410707
## 2 1e-02 0.50602564 0.16410707
## 3 1e-01 0.10461538 0.06651363
## 4 1e+00 0.07647436 0.03377427
## 5 5e+00 0.06621795 0.04795478
## 6 1e+01 0.05346154 0.04378538
## 7 1e+02 0.01269231 0.01783081
Based on the summary of the cross-validated linear kernal model, we can see that cost=10 results in the lowest cross-validation error rate. [More information on this model is presented below.]
The error for when cost= 0.001 & when cost=0.01 is 0.50602564. The error for when cost=0.1 is 0.10461538. For when cost= 1, the error is 0.07647436. For when cost=5, the error is 0.06621795. When cost=10, the error is 0.05346154. Lastly, when cost=100, we received the lowest error of 0.01269231.
best.model=tune.out$best.model
summary(best.model)
##
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto,
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)),
## kernal = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 100
## gamma: 0.003205128
##
## Number of Support Vectors: 63
##
## ( 30 33 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
The best model from the multiple cost values is where cost=100. The model used a radial SVM kernal, had a gamma of 0.003205128, amd produced 63 support vectors. This model classified 30 of the vectors to have “low MPG” which is displayed in vectors classified as “0”. This radial kernal SVM classified 33 of the vectors to have “high MPG” which is displayed in the binomial class of “1”.
The first model that I will test out is the radial kernal SVM.
set.seed(10)
tune.out2=tune(svm, mpglevel~., data=Auto, kernal="radial", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10 ,100), gamma=c(0.001, 0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out2)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 100 0.01
##
## - best performance: 0.01275641
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-03 1e-03 0.50852564 0.15639743
## 2 1e-02 1e-03 0.50852564 0.15639743
## 3 1e-01 1e-03 0.47019231 0.15169894
## 4 1e+00 1e-03 0.09179487 0.04824406
## 5 5e+00 1e-03 0.07391026 0.03469713
## 6 1e+01 1e-03 0.07141026 0.03740622
## 7 1e+02 1e-03 0.02794872 0.02510152
## 8 1e-03 1e-02 0.50602564 0.16410707
## 9 1e-02 1e-02 0.50602564 0.16410707
## 10 1e-01 1e-02 0.08923077 0.04516267
## 11 1e+00 1e-02 0.07141026 0.03740622
## 12 5e+00 1e-02 0.05096154 0.03977718
## 13 1e+01 1e-02 0.02282051 0.02780768
## 14 1e+02 1e-02 0.01275641 0.01344780
## 15 1e-03 1e-01 0.50602564 0.16410707
## 16 1e-02 1e-01 0.18634615 0.08991135
## 17 1e-01 1e-01 0.07903846 0.03260217
## 18 1e+00 1e-01 0.05089744 0.04627556
## 19 5e+00 1e-01 0.02544872 0.01688453
## 20 1e+01 1e-01 0.02038462 0.01617396
## 21 1e+02 1e-01 0.02557692 0.02417544
## 22 1e-03 1e+00 0.50602564 0.16410707
## 23 1e-02 1e+00 0.50602564 0.16410707
## 24 1e-01 1e+00 0.50602564 0.16410707
## 25 1e+00 1e+00 0.06634615 0.03647186
## 26 5e+00 1e+00 0.06634615 0.03647186
## 27 1e+01 1e+00 0.06634615 0.03647186
## 28 1e+02 1e+00 0.06634615 0.03647186
## 29 1e-03 5e+00 0.54852564 0.04386961
## 30 1e-02 5e+00 0.54852564 0.04386961
## 31 1e-01 5e+00 0.54852564 0.04386961
## 32 1e+00 5e+00 0.51006410 0.06543909
## 33 5e+00 5e+00 0.50493590 0.06826383
## 34 1e+01 5e+00 0.50493590 0.06826383
## 35 1e+02 5e+00 0.50493590 0.06826383
## 36 1e-03 1e+01 0.55102564 0.03973118
## 37 1e-02 1e+01 0.55102564 0.03973118
## 38 1e-01 1e+01 0.55102564 0.03973118
## 39 1e+00 1e+01 0.53826923 0.06036697
## 40 5e+00 1e+01 0.52801282 0.06057234
## 41 1e+01 1e+01 0.52801282 0.06057234
## 42 1e+02 1e+01 0.52801282 0.06057234
## 43 1e-03 1e+02 0.55102564 0.03973118
## 44 1e-02 1e+02 0.55102564 0.03973118
## 45 1e-01 1e+02 0.55102564 0.03973118
## 46 1e+00 1e+02 0.55102564 0.03973118
## 47 5e+00 1e+02 0.55102564 0.03973118
## 48 1e+01 1e+02 0.55102564 0.03973118
## 49 1e+02 1e+02 0.55102564 0.03973118
best.rad.model=tune.out2$best.model
summary(best.rad.model)
##
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto,
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100),
## gamma = c(0.001, 0.01, 0.1, 1, 5, 10, 100)), kernal = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 100
## gamma: 0.01
##
## Number of Support Vectors: 57
##
## ( 27 30 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
The radial kernal model that is chosen from cross-validation is the model where cost= 100 and gamma is 0.01. This model according to this kernal provides the lowest cross-validation error and has 57 support vectors. The 27 of the support vectors were classified in the “Low MPG Level” category, and 30 of the support vectors were classified in the “High MPG Level” category.
The next model that I will test out is the polynomial kernal SVM.
set.seed(15)
tune.out3=tune(svm, mpglevel~., data=Auto, kernal="polynomial", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10 ,100), degree=c(2,3,4,5)))
summary(tune.out3)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 100 2
##
## - best performance: 0.02032051
##
## - Detailed performance results:
## cost degree error dispersion
## 1 1e-03 2 0.56634615 0.04940035
## 2 1e-02 2 0.56634615 0.04940035
## 3 1e-01 2 0.10705128 0.04919266
## 4 1e+00 2 0.07641026 0.02900605
## 5 5e+00 2 0.06865385 0.03533138
## 6 1e+01 2 0.04833333 0.03644675
## 7 1e+02 2 0.02032051 0.02602990
## 8 1e-03 3 0.56634615 0.04940035
## 9 1e-02 3 0.56634615 0.04940035
## 10 1e-01 3 0.10705128 0.04919266
## 11 1e+00 3 0.07641026 0.02900605
## 12 5e+00 3 0.06865385 0.03533138
## 13 1e+01 3 0.04833333 0.03644675
## 14 1e+02 3 0.02032051 0.02602990
## 15 1e-03 4 0.56634615 0.04940035
## 16 1e-02 4 0.56634615 0.04940035
## 17 1e-01 4 0.10705128 0.04919266
## 18 1e+00 4 0.07641026 0.02900605
## 19 5e+00 4 0.06865385 0.03533138
## 20 1e+01 4 0.04833333 0.03644675
## 21 1e+02 4 0.02032051 0.02602990
## 22 1e-03 5 0.56634615 0.04940035
## 23 1e-02 5 0.56634615 0.04940035
## 24 1e-01 5 0.10705128 0.04919266
## 25 1e+00 5 0.07641026 0.02900605
## 26 5e+00 5 0.06865385 0.03533138
## 27 1e+01 5 0.04833333 0.03644675
## 28 1e+02 5 0.02032051 0.02602990
best.poly.model=tune.out3$best.model
summary(best.poly.model)
##
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto,
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100),
## degree = c(2, 3, 4, 5)), kernal = "polynomial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 100
## gamma: 0.003205128
##
## Number of Support Vectors: 63
##
## ( 30 33 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
Based on the 10-fold cross validation of the polynomial kernal model, the best model is where the degree=2, cost=100, gamma=0.003205128, and error of 0.02032051. This model had 63 support vectors which classifies 30 of the cars having “Low MPG” and 33 of cars having “High MPG”.
Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing
plot(svmfit , dat)
where svmfit contains your fitted model and dat is a data frame containing your data, you can type
plot(svmfit , dat , x1???x4)
in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.
svm.linear=svm(mpglevel~., data=Auto, kernal="linear", cost=1)
svm.rad=svm(mpglevel~., data=Auto, kernal="radial", cost=1, gamma=0.001)
svm.poly=svm(mpglevel~., data=Auto, kernal="polynomial", cost=1, degree=2)
plotpairs = function(autofit) {
for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
plot(autofit, Auto, as.formula(paste("mpg~", name, sep = "")))
}
}
plotpairs(svm.linear)
plotpairs(svm.rad)
plotpairs(svm.poly)
Within, all of these plots, you can see how variables are correlated to “High MPG Levels” or “Low MPG Levels”.
detach(Auto)
This problem involves the OJ data set which is part of the ISLR package.
library(ISLR)
attach(OJ)
str(OJ)
## 'data.frame': 1070 obs. of 18 variables:
## $ Purchase : Factor w/ 2 levels "CH","MM": 1 1 1 2 1 1 1 1 1 1 ...
## $ WeekofPurchase: num 237 239 245 227 228 230 232 234 235 238 ...
## $ StoreID : num 1 1 1 1 7 7 7 7 7 7 ...
## $ PriceCH : num 1.75 1.75 1.86 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
## $ PriceMM : num 1.99 1.99 2.09 1.69 1.69 1.99 1.99 1.99 1.99 1.99 ...
## $ DiscCH : num 0 0 0.17 0 0 0 0 0 0 0 ...
## $ DiscMM : num 0 0.3 0 0 0 0 0.4 0.4 0.4 0.4 ...
## $ SpecialCH : num 0 0 0 0 0 0 1 1 0 0 ...
## $ SpecialMM : num 0 1 0 0 0 1 1 0 0 0 ...
## $ LoyalCH : num 0.5 0.6 0.68 0.4 0.957 ...
## $ SalePriceMM : num 1.99 1.69 2.09 1.69 1.69 1.99 1.59 1.59 1.59 1.59 ...
## $ SalePriceCH : num 1.75 1.75 1.69 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
## $ PriceDiff : num 0.24 -0.06 0.4 0 0 0.3 -0.1 -0.16 -0.16 -0.16 ...
## $ Store7 : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 2 2 2 2 2 ...
## $ PctDiscMM : num 0 0.151 0 0 0 ...
## $ PctDiscCH : num 0 0 0.0914 0 0 ...
## $ ListPriceDiff : num 0.24 0.24 0.23 0 0 0.3 0.3 0.24 0.24 0.24 ...
## $ STORE : num 1 1 1 1 0 0 0 0 0 0 ...
set.seed(1002)
trainingindex=sample(dim(OJ)[1], 800)
OJ.train=OJ[trainingindex,]
OJ.test=OJ[-trainingindex,]
dim(OJ.train)
## [1] 800 18
dim(OJ.test)
## [1] 270 18
library(e1071)
OJ.svm.linear=svm(Purchase~., data= OJ.train, kernal="linear", cost=0.01)
summary(OJ.svm.linear)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernal = "linear",
## cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.01
## gamma: 0.05555556
##
## Number of Support Vectors: 606
##
## ( 306 300 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
The support vector classifier creates 606 support vectors out of 800 training points. Out of these 306 belong to the CH class and 300 belong to the MM class.
OJ.train.pred = predict(OJ.svm.linear, OJ.train)
table(OJ.train$Purchase, OJ.train.pred)
## OJ.train.pred
## CH MM
## CH 500 0
## MM 300 0
For some reason, this training data set misclassified 37.5% of the data.
OJ.test.pred=predict(OJ.svm.linear, OJ.test)
table(OJ.test$Purchase, OJ.test.pred)
## OJ.test.pred
## CH MM
## CH 153 0
## MM 117 0
The same went for the testing error rate, but was a little lower at 43.33%.
set.seed(10)
OJ.tune.out=tune(svm, Purchase~., data=OJ.train, kernel="linear", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10)))
summary(OJ.tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.1
##
## - best performance: 0.18375
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.36250 0.08057950
## 2 1e-02 0.18750 0.03435921
## 3 1e-01 0.18375 0.03821086
## 4 1e+00 0.18375 0.03821086
## 5 5e+00 0.18750 0.04039733
## 6 1e+01 0.18875 0.04267529
Tuning shows that the optimal cost is 0.1.
OJ.svm.linear=svm(Purchase~., kernel="linear", data=OJ.train, cost=OJ.tune.out$best.parameters$cost)
OJ.train.pred2=predict(OJ.svm.linear, OJ.train)
table(OJ.train$Purchase, OJ.train.pred2)
## OJ.train.pred2
## CH MM
## CH 438 62
## MM 83 217
The training error for this support vector machine is 0.18125 or incorrectly predicts the Purchase class 18.125% of the time on the training data set.
OJ.test.pred=predict(OJ.svm.linear, OJ.test)
table(OJ.test$Purchase, OJ.test.pred)
## OJ.test.pred
## CH MM
## CH 139 14
## MM 20 97
On the testing data set, the SVM model actually predicted more accurately with a testing error rate of 12.59%.
set.seed(101)
OJ.svm.radial=svm(Purchase~., data=OJ.train, kernel="radial")
summary(OJ.svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.05555556
##
## Number of Support Vectors: 384
##
## ( 195 189 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
For the radial SVM, the gamma is 0.0556. For this SVM, the model categorized 195 into the CH level and 189 of the observations in the MM category.
OJ.radial.train.pred=predict(OJ.svm.radial, OJ.train)
table(OJ.train$Purchase, OJ.radial.train.pred)
## OJ.radial.train.pred
## CH MM
## CH 455 45
## MM 86 214
On the training data, the Radial model was obtained a training error rate of 16.375%.
OJ.radial.test.pred=predict(OJ.svm.radial, OJ.test)
table(OJ.test$Purchase, OJ.radial.test.pred)
## OJ.radial.test.pred
## CH MM
## CH 138 15
## MM 29 88
Similar to the training data, the Radial SVM obtained a test error rate of 16.3%. This model had really close error rate for both the test and training data.
set.seed(101)
rad.tune.out=tune(svm, Purchase~., data=OJ.train, kernel="radial", ranges = list(cost=10^seq(-2,1, by=0.25)))
summary(rad.tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.5623413
##
## - best performance: 0.1825
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.37500 0.06481812
## 2 0.01778279 0.37500 0.06481812
## 3 0.03162278 0.37500 0.06481812
## 4 0.05623413 0.23625 0.05604128
## 5 0.10000000 0.19125 0.05337563
## 6 0.17782794 0.18375 0.04528076
## 7 0.31622777 0.18500 0.05458174
## 8 0.56234133 0.18250 0.05109903
## 9 1.00000000 0.18500 0.05296750
## 10 1.77827941 0.18750 0.05170697
## 11 3.16227766 0.19375 0.04573854
## 12 5.62341325 0.19500 0.04794383
## 13 10.00000000 0.19875 0.05185785
With the cross-validated model, we will now see how it predicts on the training and test data.
OJ.radial=svm(Purchase~., data=OJ.train, kernel="radial", cost=rad.tune.out$best.parameters$cost)
OJ.rad.train.pred=predict(OJ.radial, OJ.train)
table(OJ.train$Purchase, OJ.rad.train.pred)
## OJ.rad.train.pred
## CH MM
## CH 455 45
## MM 85 215
On the training data, the radial SVM presents a 16.25% training error rate.
OJ.radial=svm(Purchase~., data=OJ.test, kernel="radial", cost=rad.tune.out$best.parameters$cost)
OJ.rad.test.pred=predict(OJ.radial, OJ.test)
table(OJ.test$Purchase, OJ.rad.test.pred)
## OJ.rad.test.pred
## CH MM
## CH 137 16
## MM 17 100
On the test data set, the radial SVM with the optimal gammas achieved a testing error rate of 12.22%. This error rate is a lot better than when tested on the training data.
Through cross-validation, we see that the (g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.
set.seed(8112)
OJ.svm.poly = svm(Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2)
summary(svm.poly)
##
## Call:
## svm(formula = mpglevel ~ ., data = Auto, kernal = "polynomial",
## cost = 1, degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.003205128
##
## Number of Support Vectors: 174
##
## ( 86 88 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
The first model that spit out for the polynomial SVM is one that has a cost of 1 and gamma of 0.003205. This model classifies 86 of the 174 vectors to be CH and 88 of the 174 to be MM.
OJ.poly.train.pred = predict(OJ.svm.poly, OJ.train)
table(OJ.train$Purchase, OJ.poly.train.pred)
## OJ.poly.train.pred
## CH MM
## CH 467 33
## MM 115 185
On the first prediction, the poly SVM model produced a training error rate of 18.5%.
OJ.poly.test.pred = predict(OJ.svm.poly, OJ.test)
table(OJ.test$Purchase, OJ.poly.test.pred)
## OJ.poly.test.pred
## CH MM
## CH 139 14
## MM 36 81
Similar to the training data, the original poly SVM achieved an approximate test error rate of 18.52%.
set.seed(101)
OJ.poly.tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2,
ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(OJ.poly.tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 3.162278
##
## - best performance: 0.19
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.37500 0.06481812
## 2 0.01778279 0.35125 0.06050999
## 3 0.03162278 0.34375 0.06187184
## 4 0.05623413 0.32125 0.06693955
## 5 0.10000000 0.31125 0.06755913
## 6 0.17782794 0.27750 0.06556379
## 7 0.31622777 0.21375 0.04945888
## 8 0.56234133 0.20750 0.05374838
## 9 1.00000000 0.20625 0.04611655
## 10 1.77827941 0.19625 0.03821086
## 11 3.16227766 0.19000 0.03476109
## 12 5.62341325 0.19875 0.03251602
## 13 10.00000000 0.19500 0.03496029
Based on the cross validated model, the most optimal cost is 3.162278 and has the lowest error rate of 0.19.
best.OJ.svm.poly = svm(Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2, cost = OJ.poly.tune.out$best.parameters$cost)
OJ.poly.train.pred = predict(best.OJ.svm.poly, OJ.train)
table(OJ.train$Purchase, OJ.poly.train.pred)
## OJ.poly.train.pred
## CH MM
## CH 458 42
## MM 100 200
With the best parameters, the poly SVM performs better and achieves a training error rate of 17.75%.
OJ.poly.test.pred = predict(best.OJ.svm.poly, OJ.test)
table(OJ.test$Purchase, OJ.poly.test.pred)
## OJ.poly.test.pred
## CH MM
## CH 137 16
## MM 32 85
On the test data, the best poly SVM performs similiarly to the train data and achieves a test error rate of 17.78%.
Overall, the radial SVM produced the lowest training and test error rates.