SVM

For classification

\[ \begin{split} D(u) &=\beta_0 + \sum^P_{j=1}\beta_ju_j\\ &=\beta_0+\sum^n_{i=1}y_i\alpha_ix^{'}_iu \end{split} \]

\(\alpha\)=0 for samples not on the margin.

Prediction is supported by data points with largest uncertainty.

Classification of new sample

Based on summation of the product of: * the sign of the class * the model parameter * dot product between new sample and the support vectors, \(x_i^{'}u\) + distance of \(x_i\) from the origin + distance of u from the origin + cosine of angle between \(x_i\) and u

When classes are not completely separable

Cost value
Kernel function

Tuning by resampling.

Tuning parameter

Cost value

Penalise data points on the wrong side or inside the margin.

Cost value adds complexity to the boundary. High value leads to over-fitting.

Kernel functions

Substituting linear cross product with kernel function to achieve flexible decision boundaries.

\[ \begin{split} D(u)&=\beta_0 + \sum^P_{j=1}\beta_ju_j\\&=\beta_0+\sum^n_{i=1}y_i\alpha_ix^{'}_iu \end{split} \]

Prior centering and scaling avoid domination of values large in magnitude.

Specialized kernels

graph kernels in QSAR
“bag-of-words” approach and string kernels in textmining

For regression

Prediction line is supported by poorly predicted points located outside \(\pm \epsilon\) of the regression line.

SVM regression coefficients minimize \(\epsilon\) loss function: \[ Cost\sum^n_{i=1}L_\epsilon(y_i-y_i^{'}+\sum^P_{j=1}\beta^2_j) \]

Cost parameter is set manually to penalise large residuals.

library(AppliedPredictiveModeling)
data(solubility)

library(caret)

## Loading required package: lattice
## Loading required package: ggplot2

svmRTuned <- train(solTrainXtrans, solTrainY,
                   method = "svmRadial",
                   preProc = c("center","scale"),
                   tuneLength = 14,
                   trControl = trainControl(method = "cv"))

## Loading required package: kernlab

svmRTuned

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 951 samples
## 228 predictors
## 
## Pre-processing: centered (228), scaled (228) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 857, 856, 856, 855, 855, 856, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE       Rsquared   RMSE SD     Rsquared SD
##      0.25  0.7987222  0.8679683  0.08522893  0.01588388 
##      0.50  0.7042621  0.8900392  0.06991161  0.01407636 
##      1.00  0.6533926  0.9018439  0.06163992  0.01386187 
##      2.00  0.6213222  0.9095591  0.06079800  0.01457232 
##      4.00  0.6061262  0.9134779  0.06108246  0.01603500 
##      8.00  0.5920402  0.9171835  0.05957932  0.01606659 
##     16.00  0.5899165  0.9178490  0.05610869  0.01575665 
##     32.00  0.5895256  0.9179679  0.05434215  0.01542801 
##     64.00  0.5911698  0.9176429  0.05053791  0.01451370 
##    128.00  0.5928217  0.9172239  0.04642479  0.01363293 
##    256.00  0.5958985  0.9163522  0.04313798  0.01336025 
##    512.00  0.5978189  0.9157150  0.04313953  0.01402077 
##   1024.00  0.5993385  0.9153275  0.04324524  0.01409860 
##   2048.00  0.6028148  0.9143679  0.04378952  0.01412931 
## 
## Tuning parameter 'sigma' was held constant at a value of 0.00268803
## RMSE was used to select the optimal model using  the smallest value.
## The final values used for the model were sigma = 0.00268803 and C = 32.

svmRTuned$finalModel

## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 32 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.00268802992487251 
## 
## Number of Support Vectors : 625 
## 
## Objective Function Value : -378.2904 
## Training error : 0.009743

Resources

Kuhn, M., & Johnson, Kjell. (2013). Applied Predictive Modeling, SpringerLink e-books.