Produce a plot of test MSE (as in Figure 8.8 in the text) as a function of number of trees for Bagging and Random Forests.
We use 200 observations from our dataset to act as test set, using the remaining 307 as a training set. For the bagging, I use mtry=21 that indicates that all 21 predictors should be considered for each split of the tree. For the random forest of this regression trees, I use mtry=p/3=21/3=7.
load("/Users/shijiabian/Downloads/body.RData")
genderFact <- as.factor(Y$Gender)
set.seed(1)
test = sample(1:nrow(X), 200)
train = (1:nrow(X))[-test]
install.packages("randomForest")
## Error: trying to use CRAN without setting a mirror
library(randomForest)
## randomForest 4.6-7
## Type rfNews() to see new features/changes/bug fixes.
set.seed(1)
test.mse.bag <- numeric(100)
test.mse.rf <- numeric(100)
xvalue <- c(1:100) * 5
for (i in 1:100) {
bag.rdata <- randomForest(Y$Weight ~ ., data = X, subset = train, mtry = 21,
importance = TRUE, ntree = i * 5)
rf.rdata <- randomForest(Y$Weight ~ ., data = X, subset = train, mtry = 7,
importance = TRUE, ntree = i * 5)
yhat.bag <- predict(bag.rdata, newdata = X[test, ])
yhat.rf <- predict(rf.rdata, newdata = X[test, ])
test.mse.bag[i] <- mean((yhat.bag - Y[test, ]$Weight)^2)
test.mse.rf[i] <- mean((yhat.rf - Y[test, ]$Weight)^2)
}
plot(xvalue, test.mse.bag, xlab = "Number of Trees", ylab = "MSE", col = "blue",
pch = 19, type <- "l", ylim = c(9.5, 17))
points(xvalue, test.mse.rf, pch = 19, type = "l")
legend("topright", legend = paste(c("Test MSE: Bagging", "Test MSE: Random Forest"),
sep = ""), col = c("blue", "black"), lty = 1)
Which variables does your random forest identify as most important? How do they compare with the most important variables as identified by Bagging?
Here, I still use the training data set as the training data set is a big part of the entire data set, and I use default number of trees, 500, this time. For RandomForest, in terms of %IncMSE, Knee.Girth, 21.622356, is identified as the most important variable in the random forest; In terms of IncNodePurity, Waist.Girth, 11968.3360, is identified as the most important variable.
rf.rdata <- randomForest(Y$Weight ~ ., data = X, subset = train, mtry = 7, importance = TRUE)
importance(rf.rdata)
## %IncMSE IncNodePurity
## Wrist.Diam 5.469 725.7
## Wrist.Girth 6.819 1614.9
## Forearm.Girth 13.936 6731.5
## Elbow.Diam 8.188 740.7
## Bicep.Girth 13.333 8018.1
## Shoulder.Girth 11.399 3378.4
## Biacromial.Diam 10.247 280.6
## Chest.Depth 7.922 780.6
## Chest.Diam 10.695 1485.1
## Chest.Girth 15.034 8396.5
## Navel.Girth 8.760 447.8
## Waist.Girth 20.812 11250.7
## Pelvic.Breadth 7.551 300.6
## Bitrochanteric.Diam 10.931 481.0
## Hip.Girth 20.654 1835.3
## Thigh.Girth 15.729 674.6
## Knee.Diam 13.310 777.8
## Knee.Girth 21.511 2121.3
## Calf.Girth 17.533 2465.8
## Ankle.Diam 8.277 199.0
## Ankle.Girth 10.741 455.3
varImpPlot(rf.rdata)
For Bagging, in terms of %IncMSE, Waist.Girth, 30.152524, is identified as the most important variable in the bagging; In terms of IncNodePurity, Waist.Girth, 22573.6587, is identified as the most important variable in the bagging.
bag.rdata = randomForest(Y$Weight ~ ., data = X, subset = train, mtry = 21,
importance = TRUE)
importance(bag.rdata)
## %IncMSE IncNodePurity
## Wrist.Diam 5.014 109.8
## Wrist.Girth 5.795 243.1
## Forearm.Girth 11.680 5550.1
## Elbow.Diam 8.381 214.3
## Bicep.Girth 12.542 8097.6
## Shoulder.Girth 11.301 987.1
## Biacromial.Diam 11.128 226.6
## Chest.Depth 8.748 188.8
## Chest.Diam 11.219 710.9
## Chest.Girth 14.705 7006.7
## Navel.Girth 5.201 147.9
## Waist.Girth 30.049 22041.3
## Pelvic.Breadth 5.993 142.5
## Bitrochanteric.Diam 5.577 280.7
## Hip.Girth 22.912 1518.4
## Thigh.Girth 15.685 383.9
## Knee.Diam 15.738 784.2
## Knee.Girth 26.118 2219.1
## Calf.Girth 18.953 1678.5
## Ankle.Diam 6.978 154.7
## Ankle.Girth 9.972 298.2
varImpPlot(bag.rdata)
Compare the test error of your random forest (with 500 trees) against the test errors of the three methods you evaluated in problem 3(f) on Homework 3. Does your random forest make better predictions than your predictions from Homework 3?
The MSE given by the randomforest with 500 trees is 10.76685. This is slightly worse than that given in the 3(f). In 3(f), pls model had a test error around 8.65, pcr had 9.27, and forward stepwise 8.63.
set.seed(123)
rf.rdata.500 <- randomForest(Y$Weight ~ ., data = X, subset = train, mtry = 7,
importance = TRUE, ntree = 500)
yhat.rf.500 <- predict(rf.rdata.500, newdata = X[test, ])
test.mse.rf.500 <- mean((yhat.rf.500 - Y$Weight[test])^2)
test.mse.rf.500
## [1] 10.77
The randomForest() function uses 500 as the default number of trees. For this problem, is 500 enough trees? How can you tell?
500 is enough tree. I ploted the MSEs of the random forest and bagging for the tree number from 501 to 500. We can see that the MSEs for the two methods are almost constant and does not have big flatuation.
test.mse.rf.500 <- numeric(50)
test.mse.bag.500 <- numeric(50)
xvalue <- c(1:50)
for (i in 501:550) {
rf.rdata <- randomForest(Y$Weight ~ ., data = X, subset = train, mtry = 7,
importance = TRUE, ntree = i)
bag.rdata <- randomForest(Y$Weight ~ ., data = X, subset = train, mtry = 21,
importance = TRUE, ntree = i)
yhat.rf <- predict(rf.rdata, newdata = X[test, ])
test.mse.rf.500[i - 500] <- mean((yhat.rf - Y[test, ]$Weight)^2)
yhat.bag <- predict(bag.rdata, newdata = X[test, ])
test.mse.bag.500[i - 500] <- mean((yhat.bag - Y[test, ]$Weight)^2)
}
plot(xvalue, test.mse.rf.500, xlab = "Number of Trees", ylab = "MSE", col = "blue",
pch = 19, type <- "l", ylim = c(9.5, 17))
points(xvalue, test.mse.bag.500, xlab = "Number of Trees", ylab = "MSE", col = "green",
pch = 19, type <- "l")
legend("topright", legend = paste(c("Test MSE: Random Forest", "Test MSE: Bagging"),
sep = ""), col = c("blue", "green"), lty = 1)
We are given n = 7 observations in p = 2 dimensions. For each observation, there is an associated class label. Sketch the observations.
x1 <- c(3, 2, 4, 1, 2, 4, 4)
x2 <- c(4, 2, 4, 4, 1, 3, 1)
y <- c("Red", "Red", "Red", "Red", "Blue", "Blue", "Blue")
y <- as.factor(y)
data <- data.frame(x1, x2, y)
Here how I sketch the observations:
attach(data)
## The following objects are masked _by_ .GlobalEnv:
##
## x1, x2, y
plot(x1, x2, col = c("blue", "red")[y], pch = 20)
detach(data)
Sketch the optimal separating hyperplane, and provide the equation for this hyperplane.
The equation for this hyperplane: x1 - x2 - 0.5 = 0. The optimal separating hyperplane is sketched in color green.
attach(data)
## The following objects are masked _by_ .GlobalEnv:
##
## x1, x2, y
plot(x1, x2, col = c("blue", "red")[y], pch = 20)
abline(-0.5, 1, col = "green")
detach(data)
Describe the classification rule for the maximal margin classifier.
Classify to Red if x1 - x2 - 0.5 > 0, and classify to Blue otherwise. Beta_0 = -0.5, beta_1=1, beta_2=-1. \[ \beta_0 = -0.5, \beta_1=1, \beta_2=-1 \]
On your sketch, indicate the margin for the maximal margin hyperplane. How wide is the margin?
The margin is the minimal distance from the observations to the hyperplane: the perpendicular distance from the points on the dashed line to the optimal hyperplan. The margin is \( \frac{1}{2*\sqrt{2}} \) in our problem.
attach(data)
## The following objects are masked _by_ .GlobalEnv:
##
## x1, x2, y
plot(x1, x2, col = c("blue", "red")[y], pch = 20)
abline(-0.5, 1, col = "green")
abline(-1, 1, col = "black", lty = 2)
abline(0, 1, col = "black", lty = 2)
detach(data)
Indicate the support vectors for the maximal margin classier.
The support vectors are indexed as 2,3,5,6 in the table.
Argue that a slight movement of the seventh observation would not affect the maximal margin hyperplane.
The maximal margin hyperplane depends directly on only a small subset of the observations. The seventh observation is far from the maximal margin hyperplane and the support vectors. If there is a slight movement of the seventh observation, it would not affect the separating hyperplane. The maximal distance from the support factor to the hyperplane won't change by moving the seventh observation a little bit.
Sketch a hyperplane that is not the optimal separating hyperplane, and provide the equation for this hyperplane.
I plotted this hyperplane as color orange. The equation is 1.1*X_1 - X_2 - 0.6 = 0
attach(data)
## The following objects are masked _by_ .GlobalEnv:
##
## x1, x2, y
plot(x1, x2, col = c("blue", "red")[y], pch = 20)
abline(-0.5, 1, col = "green")
abline(-0.6, 1.1, col = "orange")
detach(data)
Draw an additional observation on the plot so that the two classes are no longer separable by a hyperplane.
Here I add a point (2,3.5) that is color blue, then the two classes are no longer separable by a hyperplane.
x1 <- c(3, 2, 4, 1, 2, 4, 4, 2)
x2 <- c(4, 2, 4, 4, 1, 3, 1, 3.5)
y <- c("Red", "Red", "Red", "Red", "Blue", "Blue", "Blue", "Blue")
y <- as.factor(y)
newdata <- data.frame(x1, x2, y)
attach(newdata)
## The following objects are masked _by_ .GlobalEnv:
##
## x1, x2, y
plot(x1, x2, col = c("blue", "red")[y], pch = 20)
detach(newdata)
Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
Here I created a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
install.packages("ISLR")
## Error: trying to use CRAN without setting a mirror
install.packages("e1071")
## Error: trying to use CRAN without setting a mirror
library(ISLR)
library(e1071)
## Loading required package: class
attach(OJ)
head(OJ)
## Purchase WeekofPurchase StoreID PriceCH PriceMM DiscCH DiscMM SpecialCH
## 1 CH 237 1 1.75 1.99 0.00 0.0 0
## 2 CH 239 1 1.75 1.99 0.00 0.3 0
## 3 CH 245 1 1.86 2.09 0.17 0.0 0
## 4 MM 227 1 1.69 1.69 0.00 0.0 0
## 5 CH 228 7 1.69 1.69 0.00 0.0 0
## 6 CH 230 7 1.69 1.99 0.00 0.0 0
## SpecialMM LoyalCH SalePriceMM SalePriceCH PriceDiff Store7 PctDiscMM
## 1 0 0.5000 1.99 1.75 0.24 No 0.0000
## 2 1 0.6000 1.69 1.75 -0.06 No 0.1508
## 3 0 0.6800 2.09 1.69 0.40 No 0.0000
## 4 0 0.4000 1.69 1.69 0.00 No 0.0000
## 5 0 0.9565 1.69 1.69 0.00 Yes 0.0000
## 6 1 0.9652 1.99 1.69 0.30 Yes 0.0000
## PctDiscCH ListPriceDiff STORE
## 1 0.0000 0.24 1
## 2 0.0000 0.24 1
## 3 0.0914 0.23 1
## 4 0.0000 0.00 1
## 5 0.0000 0.00 0
## 6 0.0000 0.30 0
dim(OJ)
## [1] 1070 18
# 1070 18
set.seed(345)
train = sample(1:nrow(OJ), 800)
test = c(1:nrow(OJ))[-train]
Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the SVM, and describe the results obtained.
With the cost is equale to 0.01, there are 603 support factors, 302 in one class and the rest of 301 are in another class. The gamma is 0.05555556.
set.seed(235)
Purchase <- as.factor(Purchase)
OJ.df <- data.frame(OJ, Purchase)
OJ.df <- OJ.df[, -1]
svmfit <- svm(Purchase.1 ~ ., data = OJ.df[train, ], kernel = "linear", cost = 0.01,
scale = FALSE)
summary(svmfit)
##
## Call:
## svm(formula = Purchase.1 ~ ., data = OJ.df[train, ], kernel = "linear",
## cost = 0.01, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
## gamma: 0.05556
##
## Number of Support Vectors: 603
##
## ( 302 301 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
What are the training and test error rates?
For the training error rate:
truth
predict CH MM
CH 457 149
MM 41 153
Training Error rate = 190/800 = 0.2375. The training error rate is supposed to smaller than that of the test error rate in most of the cases.
purchasepred.train <- predict(svmfit, OJ.df[train, ])
table(predict = purchasepred.train, truth = OJ.df[train, ]$Purchase.1)
## truth
## predict CH MM
## CH 457 149
## MM 41 153
For the test error rate
truth
predict CH MM
CH 146 64
MM 9 51
Testing Error rate = 73/270 = 0.2703704 (The result might be different for each time I knit HTML, but the training error rate is supposed to smaller than that of the test error rate in most of the cases).
purchasepred.test <- predict(svmfit, OJ.df[test, ])
table(predict = purchasepred.test, truth = OJ.df[test, ]$Purchase.1)
## truth
## predict CH MM
## CH 146 64
## MM 9 51
Use the tune() function to select an optimal cost. Consider values in the range 0:01 to 10.
cost=1.927273 and gamma=0.05555556 results in the lowest cross validation error rate, 0.1663551.
set.seed(365)
tune.out = tune(svm, Purchase.1 ~ ., data = OJ.df, kernel = "linear", ranges = list(cost = c(seq(0.01,
10, length.out = 100))))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1.927
##
## - best performance: 0.1664
##
## - Detailed performance results:
## cost error dispersion
## 1 0.0100 0.1701 0.02561
## 2 0.1109 0.1729 0.02423
## 3 0.2118 0.1710 0.02645
## 4 0.3127 0.1748 0.02645
## 5 0.4136 0.1720 0.02539
## 6 0.5145 0.1710 0.02455
## 7 0.6155 0.1701 0.02281
## 8 0.7164 0.1729 0.02540
## 9 0.8173 0.1729 0.02540
## 10 0.9182 0.1738 0.02962
## 11 1.0191 0.1729 0.02830
## 12 1.1200 0.1710 0.02924
## 13 1.2209 0.1701 0.03014
## 14 1.3218 0.1682 0.02821
## 15 1.4227 0.1682 0.02821
## 16 1.5236 0.1692 0.02970
## 17 1.6245 0.1692 0.02970
## 18 1.7255 0.1701 0.02779
## 19 1.8264 0.1682 0.02531
## 20 1.9273 0.1664 0.02636
## 21 2.0282 0.1682 0.02889
## 22 2.1291 0.1673 0.02623
## 23 2.2300 0.1682 0.02786
## 24 2.3309 0.1692 0.02870
## 25 2.4318 0.1692 0.02870
## 26 2.5327 0.1692 0.02870
## 27 2.6336 0.1692 0.02870
## 28 2.7345 0.1692 0.02870
## 29 2.8355 0.1692 0.02870
## 30 2.9364 0.1692 0.02870
## 31 3.0373 0.1692 0.02870
## 32 3.1382 0.1692 0.02870
## 33 3.2391 0.1692 0.02870
## 34 3.3400 0.1701 0.02882
## 35 3.4409 0.1710 0.02957
## 36 3.5418 0.1710 0.02957
## 37 3.6427 0.1710 0.02957
## 38 3.7436 0.1710 0.02957
## 39 3.8445 0.1710 0.02957
## 40 3.9455 0.1701 0.02982
## 41 4.0464 0.1692 0.02970
## 42 4.1473 0.1692 0.02970
## 43 4.2482 0.1692 0.02970
## 44 4.3491 0.1692 0.02970
## 45 4.4500 0.1692 0.02970
## 46 4.5509 0.1692 0.02970
## 47 4.6518 0.1692 0.02970
## 48 4.7527 0.1692 0.02970
## 49 4.8536 0.1692 0.02970
## 50 4.9545 0.1692 0.02970
## 51 5.0555 0.1692 0.02970
## 52 5.1564 0.1692 0.02970
## 53 5.2573 0.1701 0.02779
## 54 5.3582 0.1701 0.02779
## 55 5.4591 0.1701 0.02779
## 56 5.5600 0.1701 0.02779
## 57 5.6609 0.1701 0.02779
## 58 5.7618 0.1710 0.02753
## 59 5.8627 0.1710 0.02753
## 60 5.9636 0.1710 0.02753
## 61 6.0645 0.1710 0.02753
## 62 6.1655 0.1701 0.02599
## 63 6.2664 0.1701 0.02599
## 64 6.3673 0.1701 0.02599
## 65 6.4682 0.1701 0.02599
## 66 6.5691 0.1701 0.02599
## 67 6.6700 0.1701 0.02599
## 68 6.7709 0.1701 0.02599
## 69 6.8718 0.1692 0.02510
## 70 6.9727 0.1692 0.02510
## 71 7.0736 0.1701 0.02523
## 72 7.1745 0.1701 0.02523
## 73 7.2755 0.1692 0.02732
## 74 7.3764 0.1692 0.02732
## 75 7.4773 0.1692 0.02732
## 76 7.5782 0.1710 0.02891
## 77 7.6791 0.1701 0.02744
## 78 7.7800 0.1701 0.02744
## 79 7.8809 0.1701 0.02744
## 80 7.9818 0.1701 0.02744
## 81 8.0827 0.1701 0.02744
## 82 8.1836 0.1692 0.02623
## 83 8.2845 0.1692 0.02623
## 84 8.3855 0.1692 0.02623
## 85 8.4864 0.1692 0.02623
## 86 8.5873 0.1692 0.02623
## 87 8.6882 0.1692 0.02623
## 88 8.7891 0.1701 0.02673
## 89 8.8900 0.1701 0.02673
## 90 8.9909 0.1701 0.02673
## 91 9.0918 0.1701 0.02673
## 92 9.1927 0.1710 0.02788
## 93 9.2936 0.1710 0.02788
## 94 9.3945 0.1710 0.02788
## 95 9.4955 0.1710 0.02788
## 96 9.5964 0.1710 0.02788
## 97 9.6973 0.1720 0.02896
## 98 9.7982 0.1720 0.02896
## 99 9.8991 0.1710 0.02788
## 100 10.0000 0.1720 0.02896
bestmod = tune.out$best.model
summary(bestmod)
##
## Call:
## best.tune(method = svm, train.x = Purchase.1 ~ ., data = OJ.df,
## ranges = list(cost = c(seq(0.01, 10, length.out = 100))),
## kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1.927
## gamma: 0.05556
##
## Number of Support Vectors: 443
##
## ( 221 222 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
After adopting the new cost, the error rates for both training and testing data decrease.
set.seed(334)
svmfit.newc <- svm(Purchase.1 ~ ., data = OJ.df[train, ], kernel = "linear",
cost = bestmod$cost, scale = FALSE)
summary(svmfit.newc)
##
## Call:
## svm(formula = Purchase.1 ~ ., data = OJ.df[train, ], kernel = "linear",
## cost = bestmod$cost, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1.927
## gamma: 0.05556
##
## Number of Support Vectors: 342
##
## ( 172 170 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
For the training error rate with the new cost:
truth
predict CH MM
CH 444 80
MM 54 222
Training Error rate = 134/800 = 0.1675 (The result might be different for each time I knit HTML).
purchasepred.train.newc <- predict(svmfit.newc, OJ.df[train, ])
table(predict = purchasepred.train.newc, truth = OJ.df[train, ]$Purchase.1)
## truth
## predict CH MM
## CH 444 80
## MM 54 222
For the test error rate with the new cost:
truth
predict CH MM
CH 142 34
MM 13 81
Testing Error rate = 47/270 = 0.1740741 (The result might be different for each time I knit HTML).
purchasepred.test.newc <- predict(svmfit.newc, OJ.df[test, ])
table(predict = purchasepred.test.newc, truth = OJ.df[test, ]$Purchase.1)
## truth
## predict CH MM
## CH 142 34
## MM 13 81
Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
Repeat Part b. Number of Support Vectors is 609, 307 in one class and the rest of 302 are in another class.
install.packages("e1071")
## Error: trying to use CRAN without setting a mirror
library(e1071)
set.seed(345)
svmfit.svm <- svm(Purchase.1 ~ ., data = OJ.df[train, ], kernel = "radial",
cost = 0.01)
summary(svmfit.svm)
##
## Call:
## svm(formula = Purchase.1 ~ ., data = OJ.df[train, ], kernel = "radial",
## cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.01
## gamma: 0.05556
##
## Number of Support Vectors: 609
##
## ( 307 302 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
Part c.
For the training error rate of svm
truth
predict CH MM
CH 498 302
MM 0 0
Training Error rate = 302/800 = 0.3775. We can see that the tolerance that the data is on the wrong category is small due to the small cost parameter.
purchasepred.train.svm <- predict(svmfit.svm, newx = OJ.df[train, ])
table(pred = purchasepred.train.svm, true = OJ.df[train, ]$Purchase.1)
## true
## pred CH MM
## CH 498 302
## MM 0 0
For the test error rate
truth
predict CH MM
CH 155 115
MM 0 0
Testing Error rate = 115/270 = 0.4259259 (The result might be different for each time I knit HTML).
purchasepred.test.svm <- predict(svmfit.svm, OJ.df[test, ])
table(pred = purchasepred.test.svm, true = OJ.df[test, ]$Purchase.1)
## true
## pred CH MM
## CH 155 115
## MM 0 0
Part d
cost = 0.6154545 gamma = 0.05555556 results in the lowest cross validation error rate: 0.17000. Number of Support Vectors: 432 (216, 216) (The result might be different for each time I knit HTML).
set.seed(234)
tune.out.svm = tune(svm, Purchase.1 ~ ., data = OJ.df[train, ], kernel = "radial",
ranges = list(cost = c(seq(0.01, 10, length.out = 100))))
summary(tune.out.svm)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.6155
##
## - best performance: 0.1713
##
## - Detailed performance results:
## cost error dispersion
## 1 0.0100 0.3775 0.04888
## 2 0.1109 0.1825 0.03395
## 3 0.2118 0.1825 0.03238
## 4 0.3127 0.1750 0.03333
## 5 0.4136 0.1787 0.03538
## 6 0.5145 0.1738 0.03305
## 7 0.6155 0.1713 0.03283
## 8 0.7164 0.1713 0.03283
## 9 0.8173 0.1737 0.03654
## 10 0.9182 0.1762 0.03929
## 11 1.0191 0.1775 0.03717
## 12 1.1200 0.1800 0.03395
## 13 1.2209 0.1787 0.03587
## 14 1.3218 0.1762 0.03748
## 15 1.4227 0.1775 0.03670
## 16 1.5236 0.1775 0.03670
## 17 1.6245 0.1762 0.03408
## 18 1.7255 0.1787 0.03438
## 19 1.8264 0.1787 0.03438
## 20 1.9273 0.1775 0.03375
## 21 2.0282 0.1788 0.03336
## 22 2.1291 0.1775 0.03270
## 23 2.2300 0.1775 0.03270
## 24 2.3309 0.1775 0.03270
## 25 2.4318 0.1787 0.03336
## 26 2.5327 0.1750 0.03062
## 27 2.6336 0.1750 0.03062
## 28 2.7345 0.1738 0.02973
## 29 2.8355 0.1750 0.03227
## 30 2.9364 0.1762 0.03305
## 31 3.0373 0.1762 0.03305
## 32 3.1382 0.1762 0.03305
## 33 3.2391 0.1775 0.03051
## 34 3.3400 0.1787 0.03283
## 35 3.4409 0.1788 0.03388
## 36 3.5418 0.1775 0.03323
## 37 3.6427 0.1788 0.03438
## 38 3.7436 0.1788 0.03438
## 39 3.8445 0.1788 0.03438
## 40 3.9455 0.1788 0.03438
## 41 4.0464 0.1788 0.03438
## 42 4.1473 0.1800 0.03689
## 43 4.2482 0.1800 0.03689
## 44 4.3491 0.1813 0.03785
## 45 4.4500 0.1813 0.03785
## 46 4.5509 0.1800 0.03782
## 47 4.6518 0.1788 0.03911
## 48 4.7527 0.1800 0.03918
## 49 4.8536 0.1800 0.03918
## 50 4.9545 0.1800 0.03828
## 51 5.0555 0.1800 0.03828
## 52 5.1564 0.1800 0.03828
## 53 5.2573 0.1800 0.03828
## 54 5.3582 0.1813 0.03785
## 55 5.4591 0.1800 0.03689
## 56 5.5600 0.1813 0.03875
## 57 5.6609 0.1813 0.03875
## 58 5.7618 0.1813 0.03875
## 59 5.8627 0.1813 0.03875
## 60 5.9636 0.1813 0.03875
## 61 6.0645 0.1813 0.03875
## 62 6.1655 0.1800 0.03918
## 63 6.2664 0.1800 0.03918
## 64 6.3673 0.1800 0.03918
## 65 6.4682 0.1800 0.03918
## 66 6.5691 0.1800 0.03918
## 67 6.6700 0.1800 0.03918
## 68 6.7709 0.1800 0.03918
## 69 6.8718 0.1800 0.03918
## 70 6.9727 0.1800 0.03918
## 71 7.0736 0.1800 0.03918
## 72 7.1745 0.1800 0.03918
## 73 7.2755 0.1800 0.03918
## 74 7.3764 0.1800 0.03918
## 75 7.4773 0.1800 0.03918
## 76 7.5782 0.1800 0.03918
## 77 7.6791 0.1788 0.03866
## 78 7.7800 0.1788 0.03866
## 79 7.8809 0.1788 0.03866
## 80 7.9818 0.1775 0.03855
## 81 8.0827 0.1775 0.03855
## 82 8.1836 0.1775 0.03855
## 83 8.2845 0.1775 0.03855
## 84 8.3855 0.1775 0.03855
## 85 8.4864 0.1775 0.03855
## 86 8.5873 0.1775 0.03855
## 87 8.6882 0.1775 0.03855
## 88 8.7891 0.1775 0.03855
## 89 8.8900 0.1775 0.03855
## 90 8.9909 0.1775 0.03855
## 91 9.0918 0.1775 0.03855
## 92 9.1927 0.1775 0.03855
## 93 9.2936 0.1775 0.03855
## 94 9.3945 0.1775 0.03855
## 95 9.4955 0.1775 0.03855
## 96 9.5964 0.1775 0.03855
## 97 9.6973 0.1775 0.03855
## 98 9.7982 0.1775 0.03855
## 99 9.8991 0.1788 0.03866
## 100 10.0000 0.1788 0.03866
bestmod.svm = tune.out.svm$best.model
summary(bestmod.svm)
##
## Call:
## best.tune(method = svm, train.x = Purchase.1 ~ ., data = OJ.df[train,
## ], ranges = list(cost = c(seq(0.01, 10, length.out = 100))),
## kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.6155
## gamma: 0.05556
##
## Number of Support Vectors: 389
##
## ( 197 192 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
Part e
svmfit.svm.newc <- svm(Purchase.1 ~ ., data = OJ.df[train, ], kernel = "radial",
cost = bestmod.svm$cost)
summary(svmfit.svm.newc)
##
## Call:
## svm(formula = Purchase.1 ~ ., data = OJ.df[train, ], kernel = "radial",
## cost = bestmod.svm$cost)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.6155
## gamma: 0.05556
##
## Number of Support Vectors: 389
##
## ( 197 192 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
For the training error rate of svm
truth
predict CH MM
CH 457 85
MM 41 217
Training Error rate = 126/800 = 0.1575 (The result might be different for each time I knit HTML).
purchasepred.train.svm.newc <- predict(svmfit.svm.newc, newx = OJ.df[train,
])
table(pred = purchasepred.train.svm.newc, true = OJ.df[train, ]$Purchase.1)
## true
## pred CH MM
## CH 457 82
## MM 41 220
For the test error rate
truth
predict CH MM
CH 138 33
MM 17 82
Testing Error rate = 50/270 = 0.1851852 (The result might be different for each time I knit HTML).
purchasepred.test.svm.newc <- predict(svmfit.svm.newc, OJ.df[test, ])
table(pred = purchasepred.test.svm.newc, true = OJ.df[test, ]$Purchase.1)
## true
## pred CH MM
## CH 138 32
## MM 17 83
Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2 for part (b).
Part b. Number of Support Vectors: 607 (305,302) (The result might be different for each time I knit HTML).
set.seed(4452)
svmfit.svm.po <- svm(Purchase.1 ~ ., data = OJ.df[train, ], kernel = "polynomial",
degree = 2, cost = 0.01)
summary(svmfit.svm.po)
##
## Call:
## svm(formula = Purchase.1 ~ ., data = OJ.df[train, ], kernel = "polynomial",
## degree = 2, cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.01
## degree: 2
## gamma: 0.05556
## coef.0: 0
##
## Number of Support Vectors: 607
##
## ( 305 302 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
Part c.
For the training error rate of svm with polynomial degree 2
truth
predict CH MM
CH 497 287
MM 1 15
Training Error rate = 288/800 = 0.36 (The result might be different for each time I knit HTML).
purchasepred.train.svm.po <- predict(svmfit.svm.po, newx = OJ.df[train, ])
table(pred = purchasepred.train.svm.po, true = OJ.df[train, ]$Purchase.1)
## true
## pred CH MM
## CH 497 287
## MM 1 15
For the test error rate
truth
predict CH MM
CH 155 113
MM 0 2
Testing Error rate = 113/270 = 0.4185185 (The result might be different for each time I knit HTML).
purchasepred.test.svm.po <- predict(svmfit.svm.po, OJ.df[test, ])
table(pred = purchasepred.test.svm.po, true = OJ.df[test, ]$Purchase.1)
## true
## pred CH MM
## CH 155 113
## MM 0 2
Part d
cost = 5 gamma = 0.05555556 results in the lowest cross validation error rate: 0.17375. Number of Support Vectors: 364 (189,175) (The result might be different for each time I knit HTML).
set.seed(23)
tune.out.svm = tune(svm, Purchase.1 ~ ., data = OJ.df[train, ], kernel = "polynomial",
degree = 2, ranges = list(cost = c(0.01, 0.7, 1, 5, 10)))
summary(tune.out.svm)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.1737
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.3775 0.06003
## 2 0.70 0.1850 0.05329
## 3 1.00 0.1837 0.05002
## 4 5.00 0.1737 0.04185
## 5 10.00 0.1738 0.03558
bestmod.svm = tune.out.svm$best.model
summary(bestmod.svm)
##
## Call:
## best.tune(method = svm, train.x = Purchase.1 ~ ., data = OJ.df[train,
## ], ranges = list(cost = c(0.01, 0.7, 1, 5, 10)), kernel = "polynomial",
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 5
## degree: 2
## gamma: 0.05556
## coef.0: 0
##
## Number of Support Vectors: 364
##
## ( 189 175 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
Part e
set.seed(23)
svmfit.svm.newc <- svm(Purchase.1 ~ ., data = OJ.df[train, ], kernel = "polynomial",
, cost = 5, degree = 2)
## Error: 'type' is missing
summary(svmfit.svm.newc)
##
## Call:
## svm(formula = Purchase.1 ~ ., data = OJ.df[train, ], kernel = "radial",
## cost = bestmod.svm$cost)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.6155
## gamma: 0.05556
##
## Number of Support Vectors: 389
##
## ( 197 192 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
For the training error rate of svm
truth
predict CH MM
CH 457 82
MM 41 220
Training Error rate = 123/800 = 0.15375 (The result might be different for each time I knit HTML).
purchasepred.train.svm.newc <- predict(svmfit.svm.newc, newx = OJ.df[train,
])
table(pred = purchasepred.train.svm.newc, true = OJ.df[train, ]$Purchase.1)
## true
## pred CH MM
## CH 457 82
## MM 41 220
For the test error rate
truth
predict CH MM
CH 138 32
MM 17 83
Testing Error rate = 49/270 = 0.1814815
purchasepred.test.svm.newc <- predict(svmfit.svm.newc, OJ.df[test, ])
table(pred = purchasepred.test.svm.newc, true = OJ.df[test, ]$Purchase.1)
## true
## pred CH MM
## CH 138 32
## MM 17 83
Overall, which approach seems to give the best results on this data?
According to the traning and test error rates from (e) to (g), the training error rate and the test error rate all decrease. Thus polynomial kernel with degree = 2 is the best model.
(a) - (c) are attached with the hw.
I simulated a toy data set:
set.seed(123)
n <- 150 # number of data points
p <- 2 # dimension
sigma <- 1 # variance of the distribution
meanpos <- 0 # centre of the distribution of positive examples
meanneg <- 3 # centre of the distribution of negative examples
npos <- round(n/2) # number of positive examples
nneg <- n - npos # number of negative examples
# Generate the positive and negative examples
xpos <- matrix(rnorm(npos * p, mean = meanpos, sd = sigma), npos, p)
xneg <- matrix(rnorm(nneg * p, mean = meanneg, sd = sigma), npos, p)
x <- rbind(xpos, xneg)
# Generate the labels
y <- matrix(c(rep(1, npos), rep(-1, nneg)))
# Visualize the data
Then I performed the 2 means clustering algorithm. The color blue stare is the global optimal of the data set, but we can see that the clustering converge to the local optimal instead of the global optimal.
op <- par(mfrow = c(2, 1))
plot(x, col = ifelse(y < 0, 2, 3), main = "the True Data Visualization")
points(mean(x), mean(y), col = "blue", pch = 8)
km.out = kmeans(x, 2)
plot(x, col = (km.out$cluster + 1), main = "K-Means Clustering Results with K=2",
xlab = "", ylab = "", pch = 1, cex = 1)
points(mean(x), mean(y), col = "blue", pch = 8)
**NOTE**
This hw has the error below when uploading the packages:
Error: trying to use CRAN without setting a mirror
This is due to the internet limit in China. I have to upload the package manually in R instead of calling the function in R console.