The Support Vector Machine (or SVM) is a useful classification technique. Support vector machine methods can handle both linear and non-linear class boundaries. It can be used for both two-class and multi-class classification problems. In real life data, the separation boundary is generally nonlinear. Technically, the SVM algorithm perform a non-linear classification using what is called the kernel trick. The most commonly used kernel transformations are polynomial kernel and radial kernel.
In this demo, we’ll describe how to build SVM classifier using the caret
R package.
Note there is also an extension of the SVM for regression, called support vector regression.
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.3.0 --[39m
[30m[32mv[30m [34mggplot2[30m 3.2.1 [32mv[30m [34mpurrr [30m 0.3.3
[32mv[30m [34mtibble [30m 2.1.3 [32mv[30m [34mdplyr [30m 0.8.3
[32mv[30m [34mtidyr [30m 1.0.0 [32mv[30m [34mstringr[30m 1.4.0
[32mv[30m [34mreadr [30m 1.3.1 [32mv[30m [34mforcats[30m 0.4.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(caret)
Loading required package: lattice
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Attaching package: 㤼㸱caret㤼㸲
The following object is masked from 㤼㸱package:purrr㤼㸲:
lift
The PimaIndiansDiabetes2 in the mlbench package will be used for predicting whether or not an indivial tested positive for diabetes based on multiple clinical variables.
There aren’t enough observations to split into training and test sets so again we’ll need to rely on cross-validation to estimate the test error.
# Load the data
data("PimaIndiansDiabetes2", package = "mlbench")
pima.data <- na.omit(PimaIndiansDiabetes2)
# Inspect the data
sample_n(pima.data, 3)
# Set up Repeated k-fold Cross Validation
train_control <- trainControl(method="repeatedcv", number=10, repeats=3)
In the following example variables are normalized to make their scale comparable. This is automatically done before building the SVM classifier by setting the option preProcess = c("center","scale")
.
# Fit the model
svm1 <- train(diabetes ~., data = pima.data, method = "svmLinear", trControl = train_control, preProcess = c("center","scale"))
#View the model
svm1
Support Vector Machines with Linear Kernel
392 samples
8 predictor
2 classes: 'neg', 'pos'
Pre-processing: centered (8), scaled (8)
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 353, 353, 353, 353, 353, 353, ...
Resampling results:
Accuracy Kappa
0.7841026 0.4840113
Tuning parameter 'C' was held constant at a value of 1
The accuracy of this model is 0.7841026.
Note that, there is a tuning parameter C
, also known as Cost, that determines the possible misclassifications. It essentially imposes a penalty to the model for making an error: the higher the value of C
, the less likely it is that the SVM algorithm will misclassify a point.
By default caret
builds the SVM linear classifier using C = 1
.
It’s possible to automatically compute SVM for different values of C
and to choose the optimal one that maximize the model cross-validation accuracy.
The following R code compute SVM for a grid values of C and choose automatically the final model for predictions:
# Fit the model
svm2 <- train(diabetes ~., data = pima.data, method = "svmLinear", trControl = train_control, preProcess = c("center","scale"), tuneGrid = expand.grid(C = seq(0, 2, length = 20)))
#View the model
svm2
Support Vector Machines with Linear Kernel
392 samples
8 predictor
2 classes: 'neg', 'pos'
Pre-processing: centered (8), scaled (8)
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 353, 352, 353, 353, 352, 353, ...
Resampling results across tuning parameters:
C Accuracy Kappa
0.0000000 NaN NaN
0.1052632 0.7804915 0.4751083
0.2105263 0.7821795 0.4807332
0.3157895 0.7796154 0.4734434
0.4210526 0.7804915 0.4756044
0.5263158 0.7796368 0.4735079
0.6315789 0.7779274 0.4696187
0.7368421 0.7787607 0.4718523
0.8421053 0.7787607 0.4718523
0.9473684 0.7787607 0.4718523
1.0526316 0.7787607 0.4718523
1.1578947 0.7787607 0.4718523
1.2631579 0.7787607 0.4716954
1.3684211 0.7787607 0.4716954
1.4736842 0.7787607 0.4716954
1.5789474 0.7787607 0.4716954
1.6842105 0.7787607 0.4716954
1.7894737 0.7787607 0.4716954
1.8947368 0.7787607 0.4716954
2.0000000 0.7787607 0.4716954
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was C = 0.2105263.
# Plot model accuracy vs different values of Cost
plot(svm2)
# Print the best tuning parameter C that
# maximizes model accuracy
svm2$bestTune
We can save the results for later
res2<-as_tibble(svm2$results[which.min(svm2$results[,2]),])
res2
The choice of \(C\) = 0.6315789 provides an Accuracy of 0.7779274 which is a slight improvement over our first SVM where C was held constant at 1 (Accuracy = 0.7841026)
To build a non-linear SVM classifier, we can use either polynomial kernel or radial kernel function. Again, the caret
package can be used to easily computes the polynomial and the radial SVM non-linear models.
The package automatically choose the optimal values for the model tuning parameters, where optimal is defined as values that maximize the model accuracy.
Computing SVM using radial basis kernel:
# Fit the model
svm3 <- train(diabetes ~., data = pima.data, method = "svmRadial", trControl = train_control, preProcess = c("center","scale"), tuneLength = 10)
# Print the best tuning parameter sigma and C that maximizes model accuracy
svm3$bestTune
#View the model
svm3
Support Vector Machines with Radial Basis Function Kernel
392 samples
8 predictor
2 classes: 'neg', 'pos'
Pre-processing: centered (8), scaled (8)
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 353, 353, 352, 353, 353, 353, ...
Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.7601496 0.4192415
0.50 0.7567521 0.4142327
1.00 0.7559402 0.4207065
2.00 0.7619444 0.4400809
4.00 0.7517521 0.4179284
8.00 0.7397863 0.3912111
16.00 0.7346581 0.3913482
32.00 0.7133974 0.3478559
64.00 0.6979274 0.3204260
128.00 0.6945940 0.3190454
Tuning parameter 'sigma' was held constant at a value of 0.1230672
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.1230672 and C = 2.
#save the results for later
res3<-as_tibble(svm3$results[which.min(svm3$results[,2]),])
res3
The choice of \(C\) = 0.25 provides an Accuracy of 0.7601496 which is an improvement over our first two SVMs.
Computing SVM using polynomial basis kernel:
# Fit the model
svm4 <- train(diabetes ~., data = pima.data, method = "svmPoly", trControl = train_control, preProcess = c("center","scale"), tuneLength = 4)
# Print the best tuning parameter sigma and C that maximizes model accuracy
svm4$bestTune
#View the model
svm4
Support Vector Machines with Polynomial Kernel
392 samples
8 predictor
2 classes: 'neg', 'pos'
Pre-processing: centered (8), scaled (8)
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 353, 352, 352, 353, 353, 353, ...
Resampling results across tuning parameters:
degree scale C Accuracy Kappa
1 0.001 0.25 0.6683333 0.00000000
1 0.001 0.50 0.6683333 0.00000000
1 0.001 1.00 0.6683333 0.00000000
1 0.001 2.00 0.6657692 -0.00500000
1 0.010 0.25 0.6836752 0.07040592
1 0.010 0.50 0.7668803 0.40105993
1 0.010 1.00 0.7838889 0.47186232
1 0.010 2.00 0.7856197 0.48361479
1 0.100 0.25 0.7855983 0.48304989
1 0.100 0.50 0.7796368 0.47420746
1 0.100 1.00 0.7796154 0.47569759
1 0.100 2.00 0.7779060 0.47154204
1 1.000 0.25 0.7787393 0.47322654
1 1.000 0.50 0.7813034 0.47804154
1 1.000 1.00 0.7804487 0.47730817
1 1.000 2.00 0.7795940 0.47494083
2 0.001 0.25 0.6683333 0.00000000
2 0.001 0.50 0.6683333 0.00000000
2 0.001 1.00 0.6657692 -0.00500000
2 0.001 2.00 0.7575855 0.35729903
2 0.010 0.25 0.7694444 0.40553118
2 0.010 0.50 0.7796368 0.45907429
2 0.010 1.00 0.7796581 0.46694420
2 0.010 2.00 0.7788248 0.47008954
2 0.100 0.25 0.7677350 0.43305534
2 0.100 0.50 0.7652137 0.42574059
2 0.100 1.00 0.7686538 0.43448349
2 0.100 2.00 0.7771795 0.45749313
2 1.000 0.25 0.7610897 0.42118353
2 1.000 0.50 0.7525855 0.40420700
2 1.000 1.00 0.7466667 0.39296634
2 1.000 2.00 0.7475214 0.39705992
3 0.001 0.25 0.6683333 0.00000000
3 0.001 0.50 0.6683333 0.00000000
3 0.001 1.00 0.7015171 0.14882513
3 0.001 2.00 0.7702778 0.41655519
3 0.010 0.25 0.7745085 0.43300882
3 0.010 0.50 0.7779274 0.45775767
3 0.010 1.00 0.7771154 0.45973836
3 0.010 2.00 0.7711325 0.44870463
3 0.100 0.25 0.7805556 0.45637424
3 0.100 0.50 0.7763248 0.45623279
3 0.100 1.00 0.7703846 0.44808827
3 0.100 2.00 0.7609402 0.42899298
3 1.000 0.25 0.6920940 0.30815002
3 1.000 0.50 0.6879060 0.29726365
3 1.000 1.00 0.6913034 0.30881100
3 1.000 2.00 0.7066453 0.34463612
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were degree = 1, scale = 0.01 and C = 2.
#save the results for later
res4<-as_tibble(svm4$results[which.min(svm4$results[,2]),])
res4
The choice of scale= 0.001, \(C\) = 0.25 provides an Accuracy of 0.6683333 which is an improvement over our first three SVMs.
In these examples, it can be seen that the SVM classifier using non-linear kernels gives a better result compared to the linear model.
df<-tibble(Model=c('SVM Linear','SVM Linear w/ choice of cost','SVM Radial','SVM Poly'),Accuracy=c(svm1$results[2][[1]],res2$Accuracy,res3$Accuracy,res4$Accuracy))
df %>% arrange(Accuracy)
This demo describes how to use support vector machine for classification tasks. Other alternatives exist, such as logistic regression, discriminant analysis, and tree-based methods. You need to assess the performance of different methods on your data in order to choose the best one.