SVM Experiment

Continuing from my previous work with banking and marketing data, this experiment and analysis will use support vector machines for classification.

library(tidyverse)
library(caret)
library(e1071)
library(randomForest)

The steps for data import, pre-processing, and partitioning are all repeated from the previous work. The experiment log is imported as well.

bank_raw <- read.csv2(file="bank+marketing/bank/bank-full.csv")

bank <- bank_raw 
bank <- bank |>
  mutate(poutcome = na_if(poutcome, "unknown")) |>
  mutate(poutcome = na_if(poutcome, "other"))

chr_cols <- c("job", "marital", "education", "default", "housing", "loan", "contact", "month", "poutcome", "y")
bank <- bank |> mutate(across(all_of(chr_cols), as.factor))

head(bank)
set.seed(123)

splitIndex <- createDataPartition(bank$y, p = 0.8, list = FALSE)

bank_train <- bank[splitIndex,]
bank_test <- bank[-splitIndex,]

round(prop.table(table(select(bank, y))), 2)
## y
##   no  yes 
## 0.88 0.12
round(prop.table(table(select(bank_train, y))), 2)
## y
##   no  yes 
## 0.88 0.12
round(prop.table(table(select(bank_test, y))), 2)
## y
##   no  yes 
## 0.88 0.12
# Import the log from the previous experiments for comparison.
experiment_log <- read_csv("experiment_log.csv")

Experiment 7:

Objective: We will be testing if a support vector machine can be a better model for making classifications on this banking dataset than the algorithms from the previous experiments.

Variations: This first model will use a linear kernel with the default cost (hardness/softness of margin) of 1.

Evaluation: A table will be generated to view the SVM predictions against the actual values, along with the confusion matrix for accuracy.

Experiment:

First, as with random forests, any missing values in the data will need to be imputed. For consistency, I will again apply na.roughfix. Next, the numeric columns must be scaled due to how SVMs use distances between data points to determine the hyperplane and make classifications. If a column has a highly variable range of numbers, it could dominate the calculations; scaling helps all balance the features’ contributions.

set.seed(123)

bank_train3 <- na.roughfix(bank_train)
bank_test3 <- na.roughfix(bank_test)

num_cols <- sapply(bank_train, is.numeric)

bank_svm1 <- svm(y ~.,
                 data = bank_train,
                 scale = num_cols,
                 kernel = "linear")

summary(bank_svm1)
## 
## Call:
## svm(formula = y ~ ., data = bank_train, kernel = "linear", scale = num_cols)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
## 
## Number of Support Vectors:  1894
## 
##  ( 953 941 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  no yes
# predict and evaluate on training data
svm1_train_pred <- predict(bank_svm1, bank_train3)
table(predict = svm1_train_pred, truth = bank_train3$y)
##        truth
## predict    no   yes
##     no  31423  3282
##     yes   515   950
svm1_train_cm <- confusionMatrix(svm1_train_pred, bank_train3$y)
svm1_train_cm$overall["Accuracy"]
##  Accuracy 
## 0.8950235
# predict and evaluate on testing data
svm1_test_pred <- predict(bank_svm1, bank_test3)
table(predict = svm1_test_pred, truth = bank_test3$y)
##        truth
## predict   no  yes
##     no  7862  830
##     yes  122  227
svm1_test_cm <- confusionMatrix(svm1_test_pred, bank_test3$y)
svm1_test_cm$overall["Accuracy"]
##  Accuracy 
## 0.8947019

With this model, there are 32373 correct classifications and 3797 errors on the training data. On the testing set, there are 952 errors against 8089 correct classifications. So the accuracy values in the confusion matrix are about the same as with previous experiments using decision trees.

svm1_log <- data.frame(
  ID = 7,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "cost = 1",
  Train = 0.90,
  Test = 0.89,
  Notes = "same accuracy as decision tree experiments"
)

experiment_log <- bind_rows(experiment_log, svm1_log)

Experiment 8:

Objective: To see if we can improve on this, 10-fold cross-validation will be applied with different, commonly-used cost values to determine the best-performing model for training and testing.

Variations: Based on hyperparameter tuning, the cost will be either 0.01, 0.1, 1 (same), or 10.

Evaluation: The same table and accuracy will be generated.

Experiment:

tune_mod <- tune(svm,
                 y ~.,
                 data = bank_train,
                 kernel = "linear",
                 ranges = list(cost = c(0.01, 0.1, 1, 10)))

summary(tune_mod)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##  0.01
## 
## - best performance: 0.1775158 
## 
## - Detailed performance results:
##    cost     error  dispersion
## 1  0.01 0.1775158 0.006975374
## 2  0.10 0.1808359 0.005952814
## 3  1.00 0.1816526 0.005950147
## 4 10.00 0.1810692 0.006229219

The error rates do not vary, but the best model was determined to have a cost value of 0.01.

best_mod <- tune_mod$best.model

# predict and evaluate on training data
best_train_pred <- predict(best_mod, bank_train3)
table(predict = best_train_pred, truth = bank_train3$y)
##        truth
## predict    no   yes
##     no  31354  3169
##     yes   584  1063
best_train_cm <- confusionMatrix(best_train_pred, bank_train3$y)
best_train_cm$overall["Accuracy"]
## Accuracy 
##  0.89624
# predict and evaluate on testing data
best_test_pred <- predict(best_mod, bank_test3)
table(predict = best_test_pred, truth = bank_test3$y)
##        truth
## predict   no  yes
##     no  7841  808
##     yes  143  249
best_test_cm <- confusionMatrix(best_test_pred, bank_test3$y)
best_test_cm$overall["Accuracy"]
##  Accuracy 
## 0.8948125

In this case, the number of errors on training was 3753, a small drop of 44. On testing, there 951 errors; the overall improvement was minuscule.

svm2_log <- data.frame(
  ID = 8,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "tuned to best cost = 0.01",
  Train = 0.90,
  Test = 0.89,
  Notes = "no real improvement"
)

experiment_log <- bind_rows(experiment_log, svm2_log)

Experiment 9:

Objective: To see if changing the model from linear to the Radial Basis Function (RBF), a common non-linear kernel, will affect performance.

Variations: The kernel will the changed; and in the case of non-linear kernels, the gamma hyperparameter will be taken into account to determine how influential individual points are on the hyperplane, or basically how smooth/sensitive the decision boundary will be. The default value for svm is 1/(data dimension).

Evaluation: The same table and accuracy will be generated.

Experiment:

set.seed(123)

num_cols <- sapply(bank_train, is.numeric)

bank_svm3 <- svm(y ~.,
                 data = bank_train,
                 scale = num_cols,
                 cost = 0.1,
                 kernel = "radial")

summary(bank_svm3)
## 
## Call:
## svm(formula = y ~ ., data = bank_train, cost = 0.1, kernel = "radial", 
##     scale = num_cols)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.1 
## 
## Number of Support Vectors:  2501
## 
##  ( 1257 1244 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  no yes
# predict and evaluate on training data
svm3_train_pred <- predict(bank_svm3, bank_train3)
table(predict = svm3_train_pred, truth = bank_train3$y)
##        truth
## predict    no   yes
##     no  31186  2860
##     yes   752  1372
svm3_train_cm <- confusionMatrix(svm3_train_pred, bank_train3$y)
svm3_train_cm$overall["Accuracy"]
##  Accuracy 
## 0.9001382
# predict and evaluate on testing data
svm3_test_pred <- predict(bank_svm3, bank_test3)
table(predict = svm3_test_pred, truth = bank_test3$y)
##        truth
## predict   no  yes
##     no  7801  732
##     yes  183  325
svm3_test_cm <- confusionMatrix(svm3_test_pred, bank_test3$y)
svm3_test_cm$overall["Accuracy"]
##  Accuracy 
## 0.8987944

Again, there is minimal change to the errors and accuracy.

svm3_log <- data.frame(
  ID = 9,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "RBF kernel, cost = 0.1, gamma = 0.024",
  Train = 0.90,
  Test = 0.90,
  Notes = "no real improvement"
)

experiment_log <- bind_rows(experiment_log, svm3_log)

Experiment 10:

Objective: To perform cross-validation on models with different, commonly-used gamma values.

Variations: Based on this hyperparameter tuning, the gamma will be either 0.001, 0.024 (same), 0.1, or 1.

Evaluation: The same table and accuracy will be generated.

Experiment:

RBF_tune_mod <- tune(svm,
                     y ~.,
                     data = bank_train,
                     kernel = "radial",
                     ranges = list(gamma = c(0.001, 0.024, 0.1, 1)))

summary(RBF_tune_mod)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma
##    0.1
## 
## - best performance: 0.1607619 
## 
## - Detailed performance results:
##   gamma     error  dispersion
## 1 0.001 0.2163326 0.015528251
## 2 0.024 0.1726482 0.009425219
## 3 0.100 0.1607619 0.011262381
## 4 1.000 0.2410393 0.021859604

A gamma of 0.1 was determined to give the best performance, with an error rate of 0.16; but this is not far off from the previous model.

best_rbf_mod <- RBF_tune_mod$best.model

# predict and evaluate on training data
brbf_train_pred <- predict(best_rbf_mod, bank_train3)
table(predict = brbf_train_pred, truth = bank_train3$y)
##        truth
## predict    no   yes
##     no  31318  2883
##     yes   620  1349
brbf_train_cm <- confusionMatrix(brbf_train_pred, bank_train3$y)
brbf_train_cm$overall["Accuracy"]
##  Accuracy 
## 0.9031518
# predict and evaluate on testing data
brbf_test_pred <- predict(best_rbf_mod, bank_test3)
table(predict = brbf_test_pred, truth = bank_test3$y)
##        truth
## predict   no  yes
##     no  7809  763
##     yes  175  294
brbf_test_cm <- confusionMatrix(brbf_test_pred, bank_test3$y)
brbf_test_cm$overall["Accuracy"]
##  Accuracy 
## 0.8962504

Once again, hyperparameter tuning appeared to have little effect on the performance.

svm4_log <- data.frame(
  ID = 10,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "tuned to best gamma = 0.1",
  Train = 0.90,
  Test = 0.90,
  Notes = "no real improvement"
)

experiment_log <- bind_rows(experiment_log, svm4_log)

SVM Comparison

knitr::kable(experiment_log, format = "pipe", padding = 0)
ID Model Features Hyperparameters Train Test Notes
1 Decision Tree duration, poutcome, pdays none 0.90 0.89 marketing features only
2 Decision Tree poutcome, pdays none 0.89 0.89 dropped duration, minimal changes
3 Random Forest all, with different ranking order from decision trees after ‘duration’ impute method, number of trees 1.00 0.85 overfitting
4 Random Forest ranked ‘duration’, ‘month’, and ‘poutcome’ leaf size, number of features randomly sampled 0.85 0.81 less accurate, lowered variance
5 XGBoost all nrounds = 100, defaults 0.96 0.91 duration ranked first
6 XGBoost all k-fold cross-validation, gamma, minimum child weight, nrounds = 55 0.94 0.91 boosting rounds reduced significanty, similar accuracy
7 SVM all cost = 1 0.90 0.89 same accuracy as decision tree experiments
8 SVM all tuned to best cost = 0.01 0.90 0.89 no real improvement
9 SVM all RBF kernel, cost = 0.1, gamma = 0.024 0.90 0.90 no real improvement
10 SVM all tuned to best gamma = 0.1 0.90 0.90 no real improvement

Conclusion

The previous experiments had determined that XGBoost was the best performing model based on accuracy compared to decision trees and random forest. The addition of these results from the SVM experiments has not changed that conclusion. The SVM models, even when tuned for best cost or gamma hyperparameters, did not vary much in accuracy from each other, nor did they particularly exceed the results of the previous second-best performing algorithm, decision trees (which, unlike SVMs, did not require any particular data manipulation like imputation, encoding or scaling before training).

In the context of this binary classification problem using this large and multidimensional banking dataset, the XGBoost ensemble method performed better than single models like SVM and decision tree, as could be expected.