Problem 5:

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:

x1=runif(500)-0.5
x2=runif(500)-.5
y=1*(x1^2-x2^2>0)

Plot the observations, colored according to their class labels. Your plot should display $X_1$ on the x-axis, and $X_2$ on the y-axis.

plot(x1[y==0], x2[y==0], col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(x1[y==1], x2[y==1], col="paleturquoise3", pch=16)

From the look of this plot, the decision boundary seems to be non-linear.

Fit a logistic regression model to the data, using X1 and X2 as predictors.

dat=data.frame(x1=x1,x2=x2, y=as.factor(y))
reg.fit=glm(y~., data=dat, family = "binomial")
summary(reg.fit)

## 
## Call:
## glm(formula = y ~ ., family = "binomial", data = dat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.279  -1.189   1.078   1.143   1.254  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.03564    0.09011   0.396    0.692
## x1          -0.33740    0.31142  -1.083    0.279
## x2           0.11281    0.30999   0.364    0.716
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.95  on 499  degrees of freedom
## Residual deviance: 691.65  on 497  degrees of freedom
## AIC: 697.65
## 
## Number of Fisher Scoring iterations: 3

When looking at the summary for the logistic regression, we can see that both variables are not significant for predicting y–both are randomized numbers, so this is expected.

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

reg.prob=predict(reg.fit, newdata=dat, type="response")
reg.pred=ifelse(reg.prob>=0.5, 1, 0)
data.positive=dat[reg.pred==1,]
data.negative=dat[reg.pred==0,]
plot(data.positive$x1, data.positive$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(data.negative$x1, data.negative$x2, col="paleturquoise3", pch=16)

As expected the decision boundary is starting to look a lot more linear, but minimizes one of the classes.

Now fit a logistic regression model to the data using non-linear functions of $X_1$ and $X_$2 as predictors (e.g. $X^2_1$ , $X_1$×$X_2$, log(X_2), and so forth).

reg.fit2=glm(y~poly(x1,3) + poly(x1,3) + I(x1*x2), data=dat, family = "binomial")
summary(reg.fit2)

## 
## Call:
## glm(formula = y ~ poly(x1, 3) + poly(x1, 3) + I(x1 * x2), family = "binomial", 
##     data = dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6396  -0.7200   0.2027   0.7320   1.9134  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    0.2020     0.1160   1.742   0.0814 .  
## poly(x1, 3)1  -2.4663     3.3272  -0.741   0.4585    
## poly(x1, 3)2  36.6567     3.3184  11.047   <2e-16 ***
## poly(x1, 3)3   0.7005     3.3769   0.207   0.8357    
## I(x1 * x2)     0.6423     1.4974   0.429   0.6679    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.95  on 499  degrees of freedom
## Residual deviance: 493.35  on 495  degrees of freedom
## AIC: 503.35
## 
## Number of Fisher Scoring iterations: 5

For this non-linear logistic regression model, we get one variable, $X_1^3$ to be significant.

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

reg.prob2=predict(reg.fit2, newdata=dat, type = "response")
reg.pred2=ifelse(reg.prob2>=0.6, 1, 0)
data.positive2=dat[reg.pred2==1,]
data.negative2=dat[reg.pred2==0,]
plot(data.positive2$x1, data.positive2$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(data.negative$x1, data.negative$x2, col="paleturquoise3", pch=16)

The non-linear decision boundary is completely different that the boundaries found before. This boundary splits the data almost in the horizontal middle. This boundary is some little similarities to the original data plot as far as distinguishing between classes.

Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

library(e1071)
svm.fit=svm(as.factor(y)~x1+x2, dat, kernel="linear", cost=1)
svm.pred=predict(svm.fit, dat)
svm.positive= dat[svm.pred==1,]
svm.negative= dat[svm.pred==0,]
plot(svm.positive$x1, svm.positive$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(svm.negative$x1, svm.negative$x2, col="paleturquoise3", pch=16)

Similar to the linear logistic model, this linear kernel classifies nearly all points to a single class. There seems to be a somewhat linear decision boundary towards the bottom of the graph for this Support Vector Machine.

Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

library(e1071)
svm.fit2=svm(as.factor(y)~x1+x2, dat, kernel="radial", gamma=1, cost=1)
svm.pred2=predict(svm.fit2, dat)
svm.positive2= dat[svm.pred2==1,]
svm.negative2= dat[svm.pred2==0,]
plot(svm.positive2$x1, svm.positive2$x2, col="palevioletred3", xlab="X1", ylab="X2", pch=18)
points(svm.negative2$x1, svm.negative2$x2, col="paleturquoise3", pch=16)

For the non-linear support vector machine, I used the radial kernal. The radial kernal provides more crisp decision boundaries for the data than the original, but shows the true decision boundary for the data. In comparison with the original decision, boundary, there is less confusion between the classes. You can almost put two diagnoal perpendicular lines and be able to accurately distinguish between the classes.

Comment on your results.

The polynomial logistic regression model with interaction terms and SVM were the most powerful at finding the non-linear decision boundaries. The basic logistic regression and linear kernal SVM performed poorly at coming out with accurate boundaries.

Problem 7:

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

library(ISLR)
attach(Auto)
summary(Auto)

##       mpg          cylinders      displacement     horsepower   
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0  
##                                                                 
##      weight      acceleration        year           origin     
##  Min.   :1613   Min.   : 8.00   Min.   :70.00   Min.   :1.000  
##  1st Qu.:2225   1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000  
##  Median :2804   Median :15.50   Median :76.00   Median :1.000  
##  Mean   :2978   Mean   :15.54   Mean   :75.98   Mean   :1.577  
##  3rd Qu.:3615   3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000  
##  Max.   :5140   Max.   :24.80   Max.   :82.00   Max.   :3.000  
##                                                                
##                  name    
##  amc matador       :  5  
##  ford pinto        :  5  
##  toyota corolla    :  5  
##  amc gremlin       :  4  
##  amc hornet        :  4  
##  chevrolet chevette:  4  
##  (Other)           :365

Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

gas.median=median(Auto$mpg)
gas.class=ifelse(Auto$mpg>gas.median, 1, 0)
Auto$mpglevel=as.factor(gas.class)
str(Auto$mpglevel)

##  Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

set.seed(10)
tune.out=tune(svm, mpglevel~., data=Auto, kernal="linear", ranges=list(cost=c(0.001, 0.01, 0.1, 1,5,10,100)))
summary(tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   100
## 
## - best performance: 0.01269231 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-03 0.50602564 0.16410707
## 2 1e-02 0.50602564 0.16410707
## 3 1e-01 0.10461538 0.06651363
## 4 1e+00 0.07647436 0.03377427
## 5 5e+00 0.06621795 0.04795478
## 6 1e+01 0.05346154 0.04378538
## 7 1e+02 0.01269231 0.01783081

Based on the summary of the cross-validated linear kernal model, we can see that cost=10 results in the lowest cross-validation error rate. [More information on this model is presented below.]

The error for when cost= 0.001 & when cost=0.01 is 0.50602564. The error for when cost=0.1 is 0.10461538. For when cost= 1, the error is 0.07647436. For when cost=5, the error is 0.06621795. When cost=10, the error is 0.05346154. Lastly, when cost=100, we received the lowest error of 0.01269231.

best.model=tune.out$best.model
summary(best.model)

## 
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)), 
##     kernal = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  100 
##       gamma:  0.003205128 
## 
## Number of Support Vectors:  63
## 
##  ( 30 33 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  0 1

The best model from the multiple cost values is where cost=100. The model used a radial SVM kernal, had a gamma of 0.003205128, amd produced 63 support vectors. This model classified 30 of the vectors to have “low MPG” which is displayed in vectors classified as “0”. This radial kernal SVM classified 33 of the vectors to have “high MPG” which is displayed in the binomial class of “1”.

Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

The first model that I will test out is the radial kernal SVM.

set.seed(10)
tune.out2=tune(svm, mpglevel~., data=Auto, kernal="radial", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10 ,100), gamma=c(0.001, 0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out2)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##   100  0.01
## 
## - best performance: 0.01275641 
## 
## - Detailed performance results:
##     cost gamma      error dispersion
## 1  1e-03 1e-03 0.50852564 0.15639743
## 2  1e-02 1e-03 0.50852564 0.15639743
## 3  1e-01 1e-03 0.47019231 0.15169894
## 4  1e+00 1e-03 0.09179487 0.04824406
## 5  5e+00 1e-03 0.07391026 0.03469713
## 6  1e+01 1e-03 0.07141026 0.03740622
## 7  1e+02 1e-03 0.02794872 0.02510152
## 8  1e-03 1e-02 0.50602564 0.16410707
## 9  1e-02 1e-02 0.50602564 0.16410707
## 10 1e-01 1e-02 0.08923077 0.04516267
## 11 1e+00 1e-02 0.07141026 0.03740622
## 12 5e+00 1e-02 0.05096154 0.03977718
## 13 1e+01 1e-02 0.02282051 0.02780768
## 14 1e+02 1e-02 0.01275641 0.01344780
## 15 1e-03 1e-01 0.50602564 0.16410707
## 16 1e-02 1e-01 0.18634615 0.08991135
## 17 1e-01 1e-01 0.07903846 0.03260217
## 18 1e+00 1e-01 0.05089744 0.04627556
## 19 5e+00 1e-01 0.02544872 0.01688453
## 20 1e+01 1e-01 0.02038462 0.01617396
## 21 1e+02 1e-01 0.02557692 0.02417544
## 22 1e-03 1e+00 0.50602564 0.16410707
## 23 1e-02 1e+00 0.50602564 0.16410707
## 24 1e-01 1e+00 0.50602564 0.16410707
## 25 1e+00 1e+00 0.06634615 0.03647186
## 26 5e+00 1e+00 0.06634615 0.03647186
## 27 1e+01 1e+00 0.06634615 0.03647186
## 28 1e+02 1e+00 0.06634615 0.03647186
## 29 1e-03 5e+00 0.54852564 0.04386961
## 30 1e-02 5e+00 0.54852564 0.04386961
## 31 1e-01 5e+00 0.54852564 0.04386961
## 32 1e+00 5e+00 0.51006410 0.06543909
## 33 5e+00 5e+00 0.50493590 0.06826383
## 34 1e+01 5e+00 0.50493590 0.06826383
## 35 1e+02 5e+00 0.50493590 0.06826383
## 36 1e-03 1e+01 0.55102564 0.03973118
## 37 1e-02 1e+01 0.55102564 0.03973118
## 38 1e-01 1e+01 0.55102564 0.03973118
## 39 1e+00 1e+01 0.53826923 0.06036697
## 40 5e+00 1e+01 0.52801282 0.06057234
## 41 1e+01 1e+01 0.52801282 0.06057234
## 42 1e+02 1e+01 0.52801282 0.06057234
## 43 1e-03 1e+02 0.55102564 0.03973118
## 44 1e-02 1e+02 0.55102564 0.03973118
## 45 1e-01 1e+02 0.55102564 0.03973118
## 46 1e+00 1e+02 0.55102564 0.03973118
## 47 5e+00 1e+02 0.55102564 0.03973118
## 48 1e+01 1e+02 0.55102564 0.03973118
## 49 1e+02 1e+02 0.55102564 0.03973118

best.rad.model=tune.out2$best.model
summary(best.rad.model)

## 
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100), 
##         gamma = c(0.001, 0.01, 0.1, 1, 5, 10, 100)), kernal = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  100 
##       gamma:  0.01 
## 
## Number of Support Vectors:  57
## 
##  ( 27 30 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  0 1

The radial kernal model that is chosen from cross-validation is the model where cost= 100 and gamma is 0.01. This model according to this kernal provides the lowest cross-validation error and has 57 support vectors. The 27 of the support vectors were classified in the “Low MPG Level” category, and 30 of the support vectors were classified in the “High MPG Level” category.

The next model that I will test out is the polynomial kernal SVM.

set.seed(15)
tune.out3=tune(svm, mpglevel~., data=Auto, kernal="polynomial", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10 ,100), degree=c(2,3,4,5)))
summary(tune.out3)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##   100      2
## 
## - best performance: 0.02032051 
## 
## - Detailed performance results:
##     cost degree      error dispersion
## 1  1e-03      2 0.56634615 0.04940035
## 2  1e-02      2 0.56634615 0.04940035
## 3  1e-01      2 0.10705128 0.04919266
## 4  1e+00      2 0.07641026 0.02900605
## 5  5e+00      2 0.06865385 0.03533138
## 6  1e+01      2 0.04833333 0.03644675
## 7  1e+02      2 0.02032051 0.02602990
## 8  1e-03      3 0.56634615 0.04940035
## 9  1e-02      3 0.56634615 0.04940035
## 10 1e-01      3 0.10705128 0.04919266
## 11 1e+00      3 0.07641026 0.02900605
## 12 5e+00      3 0.06865385 0.03533138
## 13 1e+01      3 0.04833333 0.03644675
## 14 1e+02      3 0.02032051 0.02602990
## 15 1e-03      4 0.56634615 0.04940035
## 16 1e-02      4 0.56634615 0.04940035
## 17 1e-01      4 0.10705128 0.04919266
## 18 1e+00      4 0.07641026 0.02900605
## 19 5e+00      4 0.06865385 0.03533138
## 20 1e+01      4 0.04833333 0.03644675
## 21 1e+02      4 0.02032051 0.02602990
## 22 1e-03      5 0.56634615 0.04940035
## 23 1e-02      5 0.56634615 0.04940035
## 24 1e-01      5 0.10705128 0.04919266
## 25 1e+00      5 0.07641026 0.02900605
## 26 5e+00      5 0.06865385 0.03533138
## 27 1e+01      5 0.04833333 0.03644675
## 28 1e+02      5 0.02032051 0.02602990

best.poly.model=tune.out3$best.model
summary(best.poly.model)

## 
## Call:
## best.tune(method = svm, train.x = mpglevel ~ ., data = Auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100), 
##         degree = c(2, 3, 4, 5)), kernal = "polynomial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  100 
##       gamma:  0.003205128 
## 
## Number of Support Vectors:  63
## 
##  ( 30 33 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  0 1

Based on the 10-fold cross validation of the polynomial kernal model, the best model is where the degree=2, cost=100, gamma=0.003205128, and error of 0.02032051. This model had 63 support vectors which classifies 30 of the cars having “Low MPG” and 33 of cars having “High MPG”.

Make some plots to back up your assertions in (b) and (c).

Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing

plot(svmfit , dat)

where svmfit contains your fitted model and dat is a data frame containing your data, you can type

plot(svmfit , dat , x1???x4)

in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.

svm.linear=svm(mpglevel~., data=Auto, kernal="linear", cost=1)
svm.rad=svm(mpglevel~., data=Auto, kernal="radial", cost=1, gamma=0.001)
svm.poly=svm(mpglevel~., data=Auto, kernal="polynomial", cost=1, degree=2)
plotpairs = function(autofit) {
    for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
        plot(autofit, Auto, as.formula(paste("mpg~", name, sep = "")))
    }
}
plotpairs(svm.linear)

plotpairs(svm.rad)

plotpairs(svm.poly)

Within, all of these plots, you can see how variables are correlated to “High MPG Levels” or “Low MPG Levels”.

detach(Auto)

Problem 8:

This problem involves the OJ data set which is part of the ISLR package.

library(ISLR)
attach(OJ)
str(OJ)

## 'data.frame':    1070 obs. of  18 variables:
##  $ Purchase      : Factor w/ 2 levels "CH","MM": 1 1 1 2 1 1 1 1 1 1 ...
##  $ WeekofPurchase: num  237 239 245 227 228 230 232 234 235 238 ...
##  $ StoreID       : num  1 1 1 1 7 7 7 7 7 7 ...
##  $ PriceCH       : num  1.75 1.75 1.86 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
##  $ PriceMM       : num  1.99 1.99 2.09 1.69 1.69 1.99 1.99 1.99 1.99 1.99 ...
##  $ DiscCH        : num  0 0 0.17 0 0 0 0 0 0 0 ...
##  $ DiscMM        : num  0 0.3 0 0 0 0 0.4 0.4 0.4 0.4 ...
##  $ SpecialCH     : num  0 0 0 0 0 0 1 1 0 0 ...
##  $ SpecialMM     : num  0 1 0 0 0 1 1 0 0 0 ...
##  $ LoyalCH       : num  0.5 0.6 0.68 0.4 0.957 ...
##  $ SalePriceMM   : num  1.99 1.69 2.09 1.69 1.69 1.99 1.59 1.59 1.59 1.59 ...
##  $ SalePriceCH   : num  1.75 1.75 1.69 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
##  $ PriceDiff     : num  0.24 -0.06 0.4 0 0 0.3 -0.1 -0.16 -0.16 -0.16 ...
##  $ Store7        : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 2 2 2 2 2 ...
##  $ PctDiscMM     : num  0 0.151 0 0 0 ...
##  $ PctDiscCH     : num  0 0 0.0914 0 0 ...
##  $ ListPriceDiff : num  0.24 0.24 0.23 0 0 0.3 0.3 0.24 0.24 0.24 ...
##  $ STORE         : num  1 1 1 1 0 0 0 0 0 0 ...

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

set.seed(1002)
trainingindex=sample(dim(OJ)[1], 800)
OJ.train=OJ[trainingindex,]
OJ.test=OJ[-trainingindex,]
dim(OJ.train)

## [1] 800  18

dim(OJ.test)

## [1] 270  18

Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

library(e1071)
OJ.svm.linear=svm(Purchase~., data= OJ.train, kernal="linear", cost=0.01)
summary(OJ.svm.linear)

## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernal = "linear", 
##     cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.01 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  606
## 
##  ( 306 300 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

The support vector classifier creates 606 support vectors out of 800 training points. Out of these 306 belong to the CH class and 300 belong to the MM class.

What are the training and test error rates?

OJ.train.pred = predict(OJ.svm.linear, OJ.train)
table(OJ.train$Purchase, OJ.train.pred)

##     OJ.train.pred
##       CH  MM
##   CH 500   0
##   MM 300   0

For some reason, this training data set misclassified 37.5% of the data.

OJ.test.pred=predict(OJ.svm.linear, OJ.test)
table(OJ.test$Purchase, OJ.test.pred)

##     OJ.test.pred
##       CH  MM
##   CH 153   0
##   MM 117   0

The same went for the testing error rate, but was a little lower at 43.33%.

Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

set.seed(10)
OJ.tune.out=tune(svm, Purchase~., data=OJ.train, kernel="linear", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10)))
summary(OJ.tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.1
## 
## - best performance: 0.18375 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1 1e-03 0.36250 0.08057950
## 2 1e-02 0.18750 0.03435921
## 3 1e-01 0.18375 0.03821086
## 4 1e+00 0.18375 0.03821086
## 5 5e+00 0.18750 0.04039733
## 6 1e+01 0.18875 0.04267529

Tuning shows that the optimal cost is 0.1.

Compute the training and test error rates using this new value for cost.

OJ.svm.linear=svm(Purchase~., kernel="linear", data=OJ.train, cost=OJ.tune.out$best.parameters$cost)
OJ.train.pred2=predict(OJ.svm.linear, OJ.train)
table(OJ.train$Purchase, OJ.train.pred2)

##     OJ.train.pred2
##       CH  MM
##   CH 438  62
##   MM  83 217

The training error for this support vector machine is 0.18125 or incorrectly predicts the Purchase class 18.125% of the time on the training data set.

OJ.test.pred=predict(OJ.svm.linear, OJ.test)
table(OJ.test$Purchase, OJ.test.pred)

##     OJ.test.pred
##       CH  MM
##   CH 139  14
##   MM  20  97

On the testing data set, the SVM model actually predicted more accurately with a testing error rate of 12.59%.

Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

set.seed(101)
OJ.svm.radial=svm(Purchase~., data=OJ.train, kernel="radial")
summary(OJ.svm.radial)

## 
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.05555556 
## 
## Number of Support Vectors:  384
## 
##  ( 195 189 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

For the radial SVM, the gamma is 0.0556. For this SVM, the model categorized 195 into the CH level and 189 of the observations in the MM category.

OJ.radial.train.pred=predict(OJ.svm.radial, OJ.train)
table(OJ.train$Purchase, OJ.radial.train.pred)

##     OJ.radial.train.pred
##       CH  MM
##   CH 455  45
##   MM  86 214

On the training data, the Radial model was obtained a training error rate of 16.375%.

OJ.radial.test.pred=predict(OJ.svm.radial, OJ.test)
table(OJ.test$Purchase, OJ.radial.test.pred)

##     OJ.radial.test.pred
##       CH  MM
##   CH 138  15
##   MM  29  88

Similar to the training data, the Radial SVM obtained a test error rate of 16.3%. This model had really close error rate for both the test and training data.

set.seed(101)
rad.tune.out=tune(svm, Purchase~., data=OJ.train, kernel="radial", ranges = list(cost=10^seq(-2,1, by=0.25)))
summary(rad.tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##       cost
##  0.5623413
## 
## - best performance: 0.1825 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.37500 0.06481812
## 2   0.01778279 0.37500 0.06481812
## 3   0.03162278 0.37500 0.06481812
## 4   0.05623413 0.23625 0.05604128
## 5   0.10000000 0.19125 0.05337563
## 6   0.17782794 0.18375 0.04528076
## 7   0.31622777 0.18500 0.05458174
## 8   0.56234133 0.18250 0.05109903
## 9   1.00000000 0.18500 0.05296750
## 10  1.77827941 0.18750 0.05170697
## 11  3.16227766 0.19375 0.04573854
## 12  5.62341325 0.19500 0.04794383
## 13 10.00000000 0.19875 0.05185785

With the cross-validated model, we will now see how it predicts on the training and test data.

OJ.radial=svm(Purchase~., data=OJ.train, kernel="radial", cost=rad.tune.out$best.parameters$cost)
OJ.rad.train.pred=predict(OJ.radial, OJ.train)
table(OJ.train$Purchase, OJ.rad.train.pred)

##     OJ.rad.train.pred
##       CH  MM
##   CH 455  45
##   MM  85 215

On the training data, the radial SVM presents a 16.25% training error rate.

OJ.radial=svm(Purchase~., data=OJ.test, kernel="radial", cost=rad.tune.out$best.parameters$cost)
OJ.rad.test.pred=predict(OJ.radial, OJ.test)
table(OJ.test$Purchase, OJ.rad.test.pred)

##     OJ.rad.test.pred
##       CH  MM
##   CH 137  16
##   MM  17 100

On the test data set, the radial SVM with the optimal gammas achieved a testing error rate of 12.22%. This error rate is a lot better than when tested on the training data.

Through cross-validation, we see that the (g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.

set.seed(8112)
OJ.svm.poly = svm(Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2)
summary(svm.poly)

## 
## Call:
## svm(formula = mpglevel ~ ., data = Auto, kernal = "polynomial", 
##     cost = 1, degree = 2)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.003205128 
## 
## Number of Support Vectors:  174
## 
##  ( 86 88 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  0 1

The first model that spit out for the polynomial SVM is one that has a cost of 1 and gamma of 0.003205. This model classifies 86 of the 174 vectors to be CH and 88 of the 174 to be MM.

OJ.poly.train.pred = predict(OJ.svm.poly, OJ.train)
table(OJ.train$Purchase, OJ.poly.train.pred)

##     OJ.poly.train.pred
##       CH  MM
##   CH 467  33
##   MM 115 185

On the first prediction, the poly SVM model produced a training error rate of 18.5%.

OJ.poly.test.pred = predict(OJ.svm.poly, OJ.test)
table(OJ.test$Purchase, OJ.poly.test.pred)

##     OJ.poly.test.pred
##       CH  MM
##   CH 139  14
##   MM  36  81

Similar to the training data, the original poly SVM achieved an approximate test error rate of 18.52%.

set.seed(101)
OJ.poly.tune.out = tune(svm, Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2, 
    ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(OJ.poly.tune.out)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##      cost
##  3.162278
## 
## - best performance: 0.19 
## 
## - Detailed performance results:
##           cost   error dispersion
## 1   0.01000000 0.37500 0.06481812
## 2   0.01778279 0.35125 0.06050999
## 3   0.03162278 0.34375 0.06187184
## 4   0.05623413 0.32125 0.06693955
## 5   0.10000000 0.31125 0.06755913
## 6   0.17782794 0.27750 0.06556379
## 7   0.31622777 0.21375 0.04945888
## 8   0.56234133 0.20750 0.05374838
## 9   1.00000000 0.20625 0.04611655
## 10  1.77827941 0.19625 0.03821086
## 11  3.16227766 0.19000 0.03476109
## 12  5.62341325 0.19875 0.03251602
## 13 10.00000000 0.19500 0.03496029

Based on the cross validated model, the most optimal cost is 3.162278 and has the lowest error rate of 0.19.

best.OJ.svm.poly = svm(Purchase ~ ., data = OJ.train, kernel = "poly", degree = 2, cost = OJ.poly.tune.out$best.parameters$cost)
OJ.poly.train.pred = predict(best.OJ.svm.poly, OJ.train)
table(OJ.train$Purchase, OJ.poly.train.pred)

##     OJ.poly.train.pred
##       CH  MM
##   CH 458  42
##   MM 100 200

With the best parameters, the poly SVM performs better and achieves a training error rate of 17.75%.

OJ.poly.test.pred = predict(best.OJ.svm.poly, OJ.test)
table(OJ.test$Purchase, OJ.poly.test.pred)

##     OJ.poly.test.pred
##       CH  MM
##   CH 137  16
##   MM  32  85

On the test data, the best poly SVM performs similiarly to the train data and achieves a test error rate of 17.78%.

Overall, which approach seems to give the best results on this data?

Overall, the radial SVM produced the lowest training and test error rates.

Homework8_Romero_STA4143

Selena Romero

April 26, 2019

Problem 5:

Problem 7:

Problem 8: