Introduction

This is a small introduction on how a SVM is classifying a data set in a radial setting. First, let’s set up a random sample which has such a non-linear class boundary (side note: radial is non-linear).

Data

The following functions creates me a random sample of data, which we modify as follows:
- To the first 50%, we add an offset (offset_plus)
- To the next 25%, we subtract an offset (offset_minus)
- and to these modified 75%, we assign class 1, and to the rest, class 2

WORK IN PROGRESS: I’d like to update this small function so that it can give me back a random sample in a multi-variate setting (with the corresponding data manipulation - the offsetting - of course)

fcn_two_class_random_sample <- function(num_observations, num_predictors, num_classes, offset_plus, offset_minus, seed){
        set.seed(seed)
        x <- matrix(rnorm(num_observations * num_predictors), ncol = num_predictors)
        
        # add "offset_plus" to the first 50% of values (all columns)
        # subtract "offset_minus" to the next 25% of values (all columns)
        first_50 <- round(num_observations/2)
        next_25 <- round(num_observations/4)
        x[1:first_50, ] <- x[1:first_50, ] + offset_plus
        x[(first_50+1):(first_50 + next_25), ] <- x[(first_50+1):(first_50 + next_25), ] - offset_minus
        
        # assign classes to the data
        # for now: only two classes possible
        y <- c(rep(2, num_observations))
        y[1:(first_50+next_25)] <- c(rep(1, (first_50+next_25)))
        
        y <- as.factor(y)
        
        return(data.frame(x = x, y = y))
}

Let’s start with a simple random data set and let’s plot it

data1 <- fcn_two_class_random_sample(200, 2, 2, 2, 2, 1)
dim(data1) # dimensions of our predictor and outcome matrix
#> [1] 200   3
plot(x = data1$x.1, y = data1$x.2, col = data1$y)

To apply a SVM to this data set, we requre the e1071 library which includes the function svm(). Also important to note are:
- the parameter cost allows us to specify the cost of violating the margin: When the cost argument is large, then the margins will be narrow and there will be few support vectors on the margin or violating the margin.
- the parameter gamma \(\gamma\) is a positive constant with which we can influence the kernel (and the kernel being a function which quantifies the similarity of the observations)

require(e1071)
#> Loading required package: e1071
#> Warning: package 'e1071' was built under R version 3.3.3
train <- sample(200, 100, replace = FALSE) # gives me numbers back: 100 numbers of interval [1:200]
# Let's first look at the test and training data:
par(mfrow=c(1,2))
plot(x = data1$x.2[train], y = data1$x.1[train], col = data1$y[train], main = "Training Data")
plot(x = data1$x.2[-train], y = data1$x.1[-train], col = data1$y[-train], main = "Test Data")


svmfit1 <- svm(y ~ ., data = data1[train, ], kernel = "radial", gamma = 1, cost = 1)
summary(svmfit1)
#> 
#> Call:
#> svm(formula = y ~ ., data = data1[train, ], kernel = "radial", 
#>     gamma = 1, cost = 1)
#> 
#> 
#> Parameters:
#>    SVM-Type:  C-classification 
#>  SVM-Kernel:  radial 
#>        cost:  1 
#>       gamma:  1 
#> 
#> Number of Support Vectors:  37
#> 
#>  ( 17 20 )
#> 
#> 
#> Number of Classes:  2 
#> 
#> Levels: 
#>  1 2
plot(svmfit1, data1[train,])

Increasing costs obviously comes with a price: it reduces the number of training errors, but the decision boundary gets more irregular, which then might evolve into a over-fitting problem. Here you see an example:

svmfit2 <- svm(y ~ ., data = data1[train, ], kernel = "radial", gamma = 1, cost = 1e8)
plot(svmfit2, data1[train,])

But based on these two paramters, where do we find the best prediction for our problem? For this, we have a function tune() which comes quite handy: it gives as for a defined range of input parameters the outcome of the SVM.

set.seed(1)
output.tune <- tune(svm, y ~ ., data = data1[train, ], kernel = "radial",
                    ranges = list(costs = c(0.01, 0.1, 1, 10, 100, 1000), 
                                  gamma = c(0.1, 0.5, 1, 2, 3, 4)))
summary(output.tune)
#> 
#> Parameter tuning of 'svm':
#> 
#> - sampling method: 10-fold cross validation 
#> 
#> - best parameters:
#>  costs gamma
#>   0.01     2
#> 
#> - best performance: 0.12 
#> 
#> - Detailed performance results:
#>    costs gamma error dispersion
#> 1  1e-02   0.1  0.26 0.12649111
#> 2  1e-01   0.1  0.26 0.12649111
#> 3  1e+00   0.1  0.26 0.12649111
#> 4  1e+01   0.1  0.26 0.12649111
#> 5  1e+02   0.1  0.26 0.12649111
#> 6  1e+03   0.1  0.26 0.12649111
#> 7  1e-02   0.5  0.13 0.08232726
#> 8  1e-01   0.5  0.13 0.08232726
#> 9  1e+00   0.5  0.13 0.08232726
#> 10 1e+01   0.5  0.13 0.08232726
#> 11 1e+02   0.5  0.13 0.08232726
#> 12 1e+03   0.5  0.13 0.08232726
#> 13 1e-02   1.0  0.13 0.08232726
#> 14 1e-01   1.0  0.13 0.08232726
#> 15 1e+00   1.0  0.13 0.08232726
#> 16 1e+01   1.0  0.13 0.08232726
#> 17 1e+02   1.0  0.13 0.08232726
#> 18 1e+03   1.0  0.13 0.08232726
#> 19 1e-02   2.0  0.12 0.09189366
#> 20 1e-01   2.0  0.12 0.09189366
#> 21 1e+00   2.0  0.12 0.09189366
#> 22 1e+01   2.0  0.12 0.09189366
#> 23 1e+02   2.0  0.12 0.09189366
#> 24 1e+03   2.0  0.12 0.09189366
#> 25 1e-02   3.0  0.13 0.09486833
#> 26 1e-01   3.0  0.13 0.09486833
#> 27 1e+00   3.0  0.13 0.09486833
#> 28 1e+01   3.0  0.13 0.09486833
#> 29 1e+02   3.0  0.13 0.09486833
#> 30 1e+03   3.0  0.13 0.09486833
#> 31 1e-02   4.0  0.15 0.10801234
#> 32 1e-01   4.0  0.15 0.10801234
#> 33 1e+00   4.0  0.15 0.10801234
#> 34 1e+01   4.0  0.15 0.10801234
#> 35 1e+02   4.0  0.15 0.10801234
#> 36 1e+03   4.0  0.15 0.10801234

We see that the best model fits are for costs= 1e-02 : 1e3 and gamma= 2 (compare the error column). Let’s dig into this:

set.seed(1)
output.tune.2 <- tune(svm, y ~ ., data = data1[train, ], kernel = "radial",
                    ranges = list(costs = c(0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000, 10000), 
                                  gamma = 2))
summary(output.tune.2)
#> 
#> Parameter tuning of 'svm':
#> 
#> - sampling method: 10-fold cross validation 
#> 
#> - best parameters:
#>  costs gamma
#>  1e-04     2
#> 
#> - best performance: 0.12 
#> 
#> - Detailed performance results:
#>   costs gamma error dispersion
#> 1 1e-04     2  0.12 0.09189366
#> 2 1e-03     2  0.12 0.09189366
#> 3 1e-02     2  0.12 0.09189366
#> 4 1e-01     2  0.12 0.09189366
#> 5 1e+00     2  0.12 0.09189366
#> 6 1e+01     2  0.12 0.09189366
#> 7 1e+02     2  0.12 0.09189366
#> 8 1e+03     2  0.12 0.09189366
#> 9 1e+04     2  0.12 0.09189366

Intersting here is that this SVM applied on our data favores a small cost parameter, but we can’t imporve on its results by going further down with the costs.
But let’s see how we are doing with the our prediction. You remember that for the SVM training, we only used half the data. Let’s test it on the other half:

tab <- table(true = data1[-train, "y"], pred = predict(output.tune.2$best.model, newdata = data1[-train, ]))
prop.table(tab)
#>     pred
#> true    1    2
#>    1 0.74 0.03
#>    2 0.07 0.16

Conclusion

Our SVM missclassified 10% of the test data.
The plot for this SVM together with the test data looks like this:

plot(output.tune.2$best.model, data1[-train, ])

Up next

  • SVM for multiple classes
  • ROC Curve via the ROCR package