Last name_Duncan_________First name_Dennis________ RUID No. 046-00-9156
Advanced Classification: Classifiers and Support Vector Machine
PART I: Support vector classifier (Refer Section 9.6 from the text book) 13 marks
Step 1 Set variable k equal to the last 4 digits of your student number. Then initialize the random number generator as set.seed(k). This is an important requirement which makes all project results different for all students with very high level of probability. Do not re-set this value for other steps of this work.
Step 2 (1 mark) We begin by generating the observations, which belong to two classes, and checking whether the classes are linearly separable. Use commands matrix to generate two sets of data. Plot these data using command plot. Demonstrate this plot and answer to the questions if these two sets are separable.
x <- matrix(rnorm(20 *2), ncol = 2)
y <- c(rep(-1, 10), rep(1, 10))
x[y == 1, ] <- x[y == 1, ] + 1
plot(x, col = (3 - y))
Step 3 (1 mark) Fit the support vector classifier for cost function value 0.1. Note thatin order for the svm() function to perform classification (as opposed to SVM-based regression), we must encode the response as a factor variable. Provide summary of the svmfit. Plot the support vector classifier obtained.
The important point is that before following the instructions from the text book, or use the R commands from the website, you have to install package e1071.
#install.packages("e1071", dep = TRUE)
dat <- data.frame(x = x, y = as.factor(y))
library(e1071)
svmfit <- svm(y ~ ., data = dat, kernel = "linear",
cost = 0.1, scale =FALSE)
summary(svmfit)
##
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 0.1, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.1
##
## Number of Support Vectors: 16
##
## ( 8 8 )
##
##
## Number of Classes: 2
##
## Levels:
## -1 1
plot(svmfit, dat)
Step 4 (1 mark) Determine their identities of the support vectors.
svmfit$index
## [1] 1 2 3 4 5 7 9 10 12 13 14 15 16 17 18 20
Step 5 (1 mark) Increase number of cost parameter to 10. Check and identify the support vectors, wrote how they number changed.
summary(svmfit)
##
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 0.1, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.1
##
## Number of Support Vectors: 16
##
## ( 8 8 )
##
##
## Number of Classes: 2
##
## Levels:
## -1 1
svm(formula = y ~ ., data = dat, kernel = "linear", cost = 10, scale = FALSE)
##
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 10, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 10
##
## Number of Support Vectors: 7
plot(svmfit, dat)
summary(svmfit)
##
## Call:
## svm(formula = y ~ ., data = dat, kernel = "linear", cost = 0.1, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.1
##
## Number of Support Vectors: 16
##
## ( 8 8 )
##
##
## Number of Classes: 2
##
## Levels:
## -1 1
Step 6 (1 mark) Compare SVMs with a linear kernel, using a range of values of the cost parameter. Print and interpret summary.
tune.out <- tune(svm, y ~ ., data = dat, kernel = "linear",
ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.1
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.65 0.4743416
## 2 1e-02 0.65 0.4743416
## 3 1e-01 0.15 0.3374743
## 4 1e+00 0.15 0.2415229
## 5 5e+00 0.10 0.2108185
## 6 1e+01 0.10 0.2108185
## 7 1e+02 0.15 0.2415229
We see that cost=0.1 results in the lowest cross-validation error rate
Step 7 (1 mark) The tune() function stores the best model obtained; accessed it using the command. Print summary.
bestmod=tune.out$best.model
summary(bestmod)
##
## Call:
## best.tune(method = svm, train.x = y ~ ., data = dat, ranges = list(cost = c(0.001,
## 0.01, 0.1, 1, 5, 10, 100)), kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 5
##
## Number of Support Vectors: 8
##
## ( 4 4 )
##
##
## Number of Classes: 2
##
## Levels:
## -1 1
Step 8 (2 marks) Generate the test data set and predict the class labels of these test observations.
xtest <- matrix(rnorm(20 * 2), ncol = 2)
ytest <- sample(c(-1, 1), 20, rep = TRUE)
xtest[ytest == 1, ] <- xtest[ytest == 1, ] + 1
testdat <- data.frame(x = xtest, y = as.factor(ytest))
#predict class of observations
ypred <- predict(bestmod, testdat)
table(predict = ypred, truth = testdat$y)
## truth
## predict -1 1
## -1 10 2
## 1 1 7
Step 9 (2 marks) Now consider a situation in which the two classes are linearly separable. Then find a separating hyperplane using the svm() function. Separate the two classes in our simulated data so that they are linearly separable.
x[y == 1, ] <- x[y == 1, ] + 0.5
plot(x, col = (y + 5) / 2, pch = 19)
Step 10 (2 marks) Fit the support vector classifier and plot the resulting hyperplane, using a very large value of cost so that no observations are misclassified.
dat <- data.frame(x = x, y = as.factor(y))
svmfit <- svm(y ~ ., data = dat, kernal = "linear",
cost = 1e5)
summary(svmfit)
##
## Call:
## svm(formula = y ~ ., data = dat, kernal = "linear", cost = 1e+05)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1e+05
##
## Number of Support Vectors: 6
##
## ( 2 4 )
##
##
## Number of Classes: 2
##
## Levels:
## -1 1
plot(svmfit, dat)
Step 11 (1 marks) Answer the multiple choice question:
2. Are the support vectors on the boarder of the margin? YES
3. Are the support vectors within the margin? YES
In order to fit an SVM using a non-linear kernel, use the svm() function. Use a different value of the parameter kernel. To fit an SVM with a polynomial kernel use kernel=“polynomial”, and to fit an SVM with a radial kernel use kernel=“radial”. In the former case we also use the degree argument to specify a degree for the polynomial kernel (this is d in (9.22)), and in the latter case we use gamma to specify a value of γ for the radial basis kernel (9.24).
Step 1 (1 marks) Generate some data with a non-linear class boundary and plot them.
k=9156
set.seed(k)
x = matrix(rnorm(200 * 2), ncol = 2)
x[1:100, ] = x[1:100, ] + 2
y = c(rep(1, 150), rep(2, 50))
dat=data.frame(x = x, y = as.factor(y))
plot(x, col =y)
Step 2 (1 marks) Fit the training data using the svm() function with a radial kernel and γ = 1.
train = sample(200, 100)
svmfit = svm(y ~ ., data = dat[train, ], kernel = "radial", gamma = 1,
cost =1)
Step 3 (1 marks) Print summary. What can you tell about of the error? Re-fit the SVMclassification with higher cost. Print summary and plot results. What are your major concern about these results?
plot(svmfit, dat[train, ])
Step 4 (1 marks) Perform cross-validation using tune() to select the best choice of γ and cost for an SVM with a radial kernel.
tune.out = tune(svm, y ~ ., data = dat[train, ], kernel = "radial",
ranges = list(cost = c(0.1, 1, 10, 100, 1000), gamma = c(0.5, 1, 2, 3, 4)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 0.1 0.5
##
## - best performance: 0.24
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-01 0.5 0.24 0.1712698
## 2 1e+00 0.5 0.25 0.1715938
## 3 1e+01 0.5 0.32 0.1475730
## 4 1e+02 0.5 0.35 0.1581139
## 5 1e+03 0.5 0.37 0.1418136
## 6 1e-01 1.0 0.24 0.1712698
## 7 1e+00 1.0 0.28 0.1619328
## 8 1e+01 1.0 0.35 0.1581139
## 9 1e+02 1.0 0.37 0.1567021
## 10 1e+03 1.0 0.39 0.1523884
## 11 1e-01 2.0 0.24 0.1712698
## 12 1e+00 2.0 0.30 0.1563472
## 13 1e+01 2.0 0.38 0.1549193
## 14 1e+02 2.0 0.37 0.1418136
## 15 1e+03 2.0 0.28 0.1316561
## 16 1e-01 3.0 0.24 0.1712698
## 17 1e+00 3.0 0.31 0.1728840
## 18 1e+01 3.0 0.39 0.1523884
## 19 1e+02 3.0 0.34 0.1712698
## 20 1e+03 3.0 0.27 0.1251666
## 21 1e-01 4.0 0.24 0.1712698
## 22 1e+00 4.0 0.29 0.1595131
## 23 1e+01 4.0 0.37 0.1494434
## 24 1e+02 4.0 0.34 0.1505545
## 25 1e+03 4.0 0.28 0.1549193
We can see from the figure that there are a fair number of training errors in this SVM fit. If we increase the value of cost, we can reduce the number of training errors.
Step 5 (1 marks) Interpret results: what is the optimal values of cost and γ and what is the lowest percent of misclassified objects?
table(true = dat[-train, "y"], pred = predict(tune.out$best.model, newdata = dat[-train, ]))
## pred
## true 1 2
## 1 74 0
## 2 26 0
1% of test observations are misclassified by this SVM. The optimal value of cost is 1.The optimal value of γ is 0.5.