Support Vector Machines (SVMs) are a versatile supervised learning model that really shine in classification tasks, but are effective for regression too. Today, we’ll explore them using a classic dataset: the Optical Recognition of Handwritten Digits, where our goal is to classify handwritten digits from 0 to 9. You can use these links to download the training data and test data, and then load them into R:
At its core, an SVM aims to find the optimal hyperplane that best separates data points belonging to different classes in a high-dimensional space. Think of a hyperplane as a decision boundary. Consider a simple 2D example: if you have two classes of data points (say, red circles and black stars) scattered on a graph, a linear SVM tries to find the straight line that separates them with the largest possible “margin” between the closest points of each class.
Hyperplane: The decision boundary that separates the classes. In 2D, it’s a line; in 3D, it’s a plane; and in higher dimensions, it’s a hyperplane.
Margin: The distance between the hyperplane and the closest data points from each class. SVMs try to maximize this margin. A larger margin generally leads to better generalization on unseen data.
Support Vectors: These are the data points closest to the hyperplane (lying on the margin). They are the critical elements because if you move them, the hyperplane’s position changes. All other data points are less influential.
What if our data isn’t linearly separable? What if the red circles are inside the stars? A straight line won’t cut it. This is where the kernel trick comes in! 🤯
The kernel trick allows SVMs to implicitly map data into a higher-dimensional space where it might become linearly separable, without actually calculating the coordinates in that higher dimension. This saves a lot of computational cost. It’s like projecting your data onto a new, more complex surface where you can draw a straight line to separate them.
Common kernels include: - Linear Kernel: This is for linearly separable data, essentially finding a straight hyperplane.
Polynomial Kernel: Useful for non-linear relationships, it transforms the data into a polynomial feature space.
Radial Basis Function (RBF) Kernel (also known as Gaussian kernel): This is one of the most popular and powerful kernels. It can handle highly non-linear relationships by mapping data into an infinite-dimensional space. It uses a similarity measure (like Euclidean distance) to decide how similar two points are.
Let’s apply SVMs to the Optical Recognition of Handwritten Digits. This dataset contains images of handwritten digits (0-9), each represented by 64 features (8x8 pixel values). We have a training and test set.
# Install necessary packages if you haven't already
# install.packages("dplyr")
# install.packages("caret")
# install.packages("e1071") # e1071 contains the svm function
library(dplyr)
library(readr)
library(caret)
library(e1071) # Provides the 'svm' function
# --- 1. Load the data ---
# Make sure you have downloaded the 'optdigits.tra' and 'optdigits.tes' files
# and placed them in your working directory, or provide the full path.
# Load training data
#train_data_path <- "https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra"
#train_df <- read.csv("C:/Users/email/Downloads/optdigits_train.csv", header = FALSE)
train_df <- read_csv("optdigits_train.csv.txt", col_names = FALSE)
Rows: 3823 Columns: 65── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (65): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Load test data
#test_data_path <- "https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tes"
test_df <- read_csv("optdigits_test.csv.txt", col_names = FALSE)
Rows: 1797 Columns: 65── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (65): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# The last column (X65) is the target variable (digit 0-9)
# The other 64 columns (V1-V64) are the pixel features.
# Rename the target variable for clarity
names(train_df)[65] <- "digit"
names(test_df)[65] <- "digit"
# Convert the target variable to a factor, which is required for classification in caret
train_df$digit <- as.factor(train_df$digit)
test_df$digit <- as.factor(test_df$digit)
# --- 2. Prepare for training (optional: create smaller subset for quicker testing) ---
# For demonstration, we might use a smaller subset to make training faster.
# If you want to use the full dataset, comment out the following lines.
set.seed(123) # for reproducibility
# train_indices <- sample(1:nrow(train_df), size = 1000) # Use 1000 samples for quicker demo
# train_subset <- train_df[train_indices, ]
# test_subset <- test_df # Keep test_df for evaluation
train_subset <- train_df # Use full training set
test_subset <- test_df # Use full test set
# Define training control for cross-validation
# This helps in getting a more robust estimate of model performance
train_control <- trainControl(method = "cv", # Cross-validation
number = 5) # 5-fold cross-validation
# --- 3. Train SVM Models with Different Kernels ---
cat("Training SVM with Linear Kernel...\n")
Training SVM with Linear Kernel...
# Linear Kernel SVM
# The 'method' parameter corresponds to 'svmLinear' in caret
# We're tuning 'C' (cost) which controls the trade-off between misclassification
# and margin maximization.
svm_linear_model <- train(digit ~ .,
data = train_subset,
method = "svmLinear",
trControl = train_control,
preProcess = c("center", "scale"), # Center and scale features
tuneLength = 3) # Try 3 different C values for tuning
Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.
print(svm_linear_model)
Support Vector Machines with Linear Kernel
3823 samples
64 predictor
10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
Pre-processing: centered (64), scaled (64)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 3058, 3058, 3058, 3057, 3061
Resampling results:
Accuracy Kappa
0.9809081 0.978786
Tuning parameter 'C' was held constant at a value of 1
cat("\nTraining SVM with Radial Basis Function (RBF) Kernel...\n")
Training SVM with Radial Basis Function (RBF) Kernel...
# RBF Kernel SVM (Radial)
# The 'method' parameter corresponds to 'svmRadial' in caret
# We're tuning 'C' (cost) and 'sigma' (gamma in some libraries),
# which defines the influence of a single training example.
svm_radial_model <- train(digit ~ .,
data = train_subset,
method = "svmRadial",
trControl = train_control,
preProcess = c("center", "scale"),
tuneLength = 3)
Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.
print(svm_radial_model)
Support Vector Machines with Radial Basis Function Kernel
3823 samples
64 predictor
10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
Pre-processing: centered (64), scaled (64)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 3058, 3059, 3057, 3058, 3060
Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.9704417 0.9671558
0.50 0.9803808 0.9782000
1.00 0.9848270 0.9831405
Tuning parameter 'sigma' was held constant at a value of 0.01192708
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01192708 and C = 1.
cat("\nTraining SVM with Polynomial Kernel...\n")
Training SVM with Polynomial Kernel...
# Polynomial Kernel SVM
# The 'method' parameter corresponds to 'svmPoly' in caret
# We're tuning 'C' (cost), 'degree' (polynomial degree), and 'scale'
svm_poly_model <- train(digit ~ .,
data = train_subset,
method = "svmPoly",
trControl = train_control,
preProcess = c("center", "scale"),
tuneLength = 3)
Warning: These variables have zero variances: X1, X40Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40, X57Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.Warning: These variables have zero variances: X1, X40Warning: Variable(s) `' constant. Cannot scale data.
print(svm_poly_model)
Support Vector Machines with Polynomial Kernel
3823 samples
64 predictor
10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
Pre-processing: centered (64), scaled (64)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 3055, 3058, 3060, 3059, 3060
Resampling results across tuning parameters:
degree scale C Accuracy Kappa
1 0.001 0.25 0.9212657 0.9125119
1 0.001 0.50 0.9437638 0.9375135
1 0.001 1.00 0.9573669 0.9526286
1 0.010 0.25 0.9673013 0.9636670
1 0.010 0.50 0.9738437 0.9709366
1 0.010 1.00 0.9780326 0.9755912
1 0.100 0.25 0.9806439 0.9784927
1 0.100 0.50 0.9822143 0.9802375
1 0.100 1.00 0.9837873 0.9819854
2 0.001 0.25 0.9440242 0.9378027
2 0.001 0.50 0.9584116 0.9537894
2 0.001 1.00 0.9659941 0.9622144
2 0.010 0.25 0.9801238 0.9779148
2 0.010 0.50 0.9840460 0.9822730
2 0.010 1.00 0.9861389 0.9845984
2 0.100 0.25 0.9903182 0.9892421
2 0.100 0.50 0.9897939 0.9886596
2 0.100 1.00 0.9897939 0.9886596
3 0.001 0.25 0.9531818 0.9479783
3 0.001 0.50 0.9646882 0.9607634
3 0.001 1.00 0.9707051 0.9674491
3 0.010 0.25 0.9856126 0.9840137
3 0.010 0.50 0.9874458 0.9860505
3 0.010 1.00 0.9877082 0.9863422
3 0.100 0.25 0.9897943 0.9886600
3 0.100 0.50 0.9897943 0.9886600
3 0.100 1.00 0.9897943 0.9886600
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were degree = 2, scale = 0.1 and
C = 0.25.
# --- 4. Evaluate the Models on the Test Set ---
cat("\nEvaluating Linear Kernel SVM...\n")
Evaluating Linear Kernel SVM...
predictions_linear <- predict(svm_linear_model, newdata = test_subset)
confusion_matrix_linear <- confusionMatrix(predictions_linear, test_subset$digit)
print(confusion_matrix_linear)
Confusion Matrix and Statistics
Reference
Prediction 0 1 2 3 4 5 6 7 8 9
0 178 0 0 1 0 0 0 0 0 1
1 0 176 5 0 0 0 0 0 9 1
2 0 0 171 4 0 1 0 0 1 0
3 0 0 0 172 0 1 0 0 3 5
4 0 0 0 0 180 0 2 1 0 3
5 0 0 0 2 0 179 0 4 2 3
6 0 3 1 0 0 0 178 0 0 0
7 0 0 0 1 0 0 0 167 0 0
8 0 2 0 2 1 0 1 0 158 2
9 0 1 0 1 0 1 0 7 1 165
Overall Statistics
Accuracy : 0.9594
95% CI : (0.9492, 0.968)
No Information Rate : 0.1018
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9549
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6 Class: 7 Class: 8 Class: 9
Sensitivity 1.00000 0.96703 0.96610 0.93989 0.9945 0.98352 0.98343 0.93296 0.90805 0.91667
Specificity 0.99876 0.99071 0.99630 0.99442 0.9963 0.99319 0.99752 0.99938 0.99507 0.99320
Pos Pred Value 0.98889 0.92147 0.96610 0.95028 0.9677 0.94211 0.97802 0.99405 0.95181 0.93750
Neg Pred Value 1.00000 0.99626 0.99630 0.99319 0.9994 0.99813 0.99814 0.99263 0.99019 0.99075
Prevalence 0.09905 0.10128 0.09850 0.10184 0.1007 0.10128 0.10072 0.09961 0.09683 0.10017
Detection Rate 0.09905 0.09794 0.09516 0.09572 0.1002 0.09961 0.09905 0.09293 0.08792 0.09182
Detection Prevalence 0.10017 0.10629 0.09850 0.10072 0.1035 0.10573 0.10128 0.09349 0.09238 0.09794
Balanced Accuracy 0.99938 0.97887 0.98120 0.96716 0.9954 0.98835 0.99048 0.96617 0.95156 0.95493
cat("\nEvaluating RBF Kernel SVM...\n")
Evaluating RBF Kernel SVM...
predictions_radial <- predict(svm_radial_model, newdata = test_subset)
confusion_matrix_radial <- confusionMatrix(predictions_radial, test_subset$digit)
print(confusion_matrix_radial)
Confusion Matrix and Statistics
Reference
Prediction 0 1 2 3 4 5 6 7 8 9
0 177 0 0 0 0 0 1 0 0 0
1 0 180 6 0 1 0 0 0 6 0
2 0 0 168 2 0 0 0 0 0 0
3 0 0 0 173 0 0 0 0 0 1
4 1 1 3 1 178 0 0 0 2 0
5 0 0 0 3 0 181 0 1 1 1
6 0 0 0 0 0 0 180 0 0 1
7 0 0 0 1 0 0 0 171 0 1
8 0 1 0 1 2 0 0 0 159 2
9 0 0 0 2 0 1 0 7 6 174
Overall Statistics
Accuracy : 0.9688
95% CI : (0.9597, 0.9764)
No Information Rate : 0.1018
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9654
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6 Class: 7 Class: 8 Class: 9
Sensitivity 0.99438 0.9890 0.94915 0.94536 0.98343 0.9945 0.9945 0.95531 0.91379 0.96667
Specificity 0.99938 0.9920 0.99877 0.99938 0.99505 0.9963 0.9994 0.99876 0.99630 0.99011
Pos Pred Value 0.99438 0.9326 0.98824 0.99425 0.95699 0.9679 0.9945 0.98844 0.96364 0.91579
Neg Pred Value 0.99938 0.9988 0.99447 0.99384 0.99814 0.9994 0.9994 0.99507 0.99081 0.99627
Prevalence 0.09905 0.1013 0.09850 0.10184 0.10072 0.1013 0.1007 0.09961 0.09683 0.10017
Detection Rate 0.09850 0.1002 0.09349 0.09627 0.09905 0.1007 0.1002 0.09516 0.08848 0.09683
Detection Prevalence 0.09905 0.1074 0.09460 0.09683 0.10351 0.1041 0.1007 0.09627 0.09182 0.10573
Balanced Accuracy 0.99688 0.9905 0.97396 0.97237 0.98924 0.9954 0.9969 0.97704 0.95505 0.97839
cat("\nEvaluating Polynomial Kernel SVM...\n")
Evaluating Polynomial Kernel SVM...
predictions_poly <- predict(svm_poly_model, newdata = test_subset)
confusion_matrix_poly <- confusionMatrix(predictions_poly, test_subset$digit)
print(confusion_matrix_poly)
Confusion Matrix and Statistics
Reference
Prediction 0 1 2 3 4 5 6 7 8 9
0 178 0 0 0 0 0 2 0 0 0
1 0 180 2 0 0 0 0 0 4 0
2 0 0 171 1 0 0 0 0 0 0
3 0 0 0 177 0 0 0 0 1 2
4 0 0 0 0 180 0 0 0 0 1
5 0 0 0 1 0 181 0 2 0 1
6 0 1 3 0 0 0 179 0 0 0
7 0 0 0 0 0 0 0 170 0 0
8 0 1 1 3 1 0 0 0 166 2
9 0 0 0 1 0 1 0 7 3 174
Overall Statistics
Accuracy : 0.9772
95% CI : (0.9692, 0.9836)
No Information Rate : 0.1018
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9746
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6 Class: 7 Class: 8 Class: 9
Sensitivity 1.00000 0.9890 0.96610 0.9672 0.9945 0.9945 0.98895 0.94972 0.95402 0.96667
Specificity 0.99876 0.9963 0.99938 0.9981 0.9994 0.9975 0.99752 1.00000 0.99507 0.99258
Pos Pred Value 0.98889 0.9677 0.99419 0.9833 0.9945 0.9784 0.97814 1.00000 0.95402 0.93548
Neg Pred Value 1.00000 0.9988 0.99631 0.9963 0.9994 0.9994 0.99876 0.99447 0.99507 0.99628
Prevalence 0.09905 0.1013 0.09850 0.1018 0.1007 0.1013 0.10072 0.09961 0.09683 0.10017
Detection Rate 0.09905 0.1002 0.09516 0.0985 0.1002 0.1007 0.09961 0.09460 0.09238 0.09683
Detection Prevalence 0.10017 0.1035 0.09572 0.1002 0.1007 0.1029 0.10184 0.09460 0.09683 0.10351
Balanced Accuracy 0.99938 0.9926 0.98274 0.9827 0.9969 0.9960 0.99324 0.97486 0.97455 0.97962
cat("\n--- Comparison of Model Accuracies ---\n")
--- Comparison of Model Accuracies ---
cat("Linear SVM Accuracy:", round(confusion_matrix_linear$overall['Accuracy'], 4), "\n")
Linear SVM Accuracy: 0.9594
cat("RBF SVM Accuracy:", round(confusion_matrix_radial$overall['Accuracy'], 4), "\n")
RBF SVM Accuracy: 0.9688
cat("Polynomial SVM Accuracy:", round(confusion_matrix_poly$overall['Accuracy'], 4), "\n")
Polynomial SVM Accuracy: 0.9772
When you run the R script, you’ll see a lot of output! Here’s what to look for:
train function output: For each model, caret will show you the results of the cross-validation. It tries different combinations of hyperparameters (like C, sigma, degree) and reports the accuracy for each. It then selects the best set of hyperparameters based on the highest accuracy.
Confusion Matrix: This is crucial for understanding performance.
You’ll notice the RBF kernel (svmRadial) often performs very well on this kind of complex, non-linear data because it’s good at capturing intricate relationships. The polynomial kernel can also perform well, depending on its degree. The linear kernel might struggle if the digit classes aren’t perfectly separable in their original feature space.
As shown above, changing the kernel is as simple as changing the
method argument in the train function:
- method = "svmLinear"
for a linear SVM.
- method = "svmRadial"
for an RBF (radial) SVM.
- method = "svmPoly"
for a polynomial SVM.
For each kernel, caret automatically handles some hyperparameter tuning (e.g., C for all SVMs, sigma for RBF, degree for polynomial). The tuneLength argument controls how many different values of these hyperparameters caret will try. You can also explicitly define a tuneGrid for more precise control over the tuning process.
# Example of explicit tuneGrid for RBF kernel
# my_tune_grid <- expand.grid(C = c(0.1, 1, 10),
# sigma = c(0.01, 0.1, 1))
# svm_radial_model_tuned <- train(digit ~ .,
# data = train_subset,
# method = "svmRadial",
# trControl = train_control,
# preProcess = c("center", "scale"),
# tuneGrid = my_tune_grid)
By changing the kernel and observing the accuracy and confusion matrices, you can clearly see how different kernels handle the complexity of the digit data. For this dataset, the non-linear kernels (especially RBF) are generally expected to outperform the linear kernel because handwritten digits involve intricate patterns that are not linearly separable.