SVM Experiment

Continuing from my previous work with banking and marketing data, this experiment and analysis will use support vector machines for classification.

library(tidyverse)
library(caret)
library(e1071)
library(randomForest)

The steps for data import, pre-processing, and partitioning are all repeated from the previous work. The experiment log is imported as well.

bank_raw <- read.csv2(file="bank+marketing/bank/bank-full.csv")

bank <- bank_raw 
bank <- bank |>
  mutate(poutcome = na_if(poutcome, "unknown")) |>
  mutate(poutcome = na_if(poutcome, "other"))

chr_cols <- c("job", "marital", "education", "default", "housing", "loan", "contact", "month", "poutcome", "y")
bank <- bank |> mutate(across(all_of(chr_cols), as.factor))

head(bank)

set.seed(123)

splitIndex <- createDataPartition(bank$y, p = 0.8, list = FALSE)

bank_train <- bank[splitIndex,]
bank_test <- bank[-splitIndex,]

round(prop.table(table(select(bank, y))), 2)

## y
##   no  yes 
## 0.88 0.12

round(prop.table(table(select(bank_train, y))), 2)

## y
##   no  yes 
## 0.88 0.12

round(prop.table(table(select(bank_test, y))), 2)

## y
##   no  yes 
## 0.88 0.12

# Import the log from the previous experiments for comparison.
experiment_log <- read_csv("experiment_log.csv")

Experiment 7:

Objective: We will be testing if a support vector machine can be a better model for making classifications on this banking dataset than the algorithms from the previous experiments.

Variations: This first model will use a linear kernel with the default cost (hardness/softness of margin) of 1.

Evaluation: A table will be generated to view the SVM predictions against the actual values, along with the confusion matrix for accuracy.

Experiment:

First, as with random forests, any missing values in the data will need to be imputed. For consistency, I will again apply na.roughfix. Next, the numeric columns must be scaled due to how SVMs use distances between data points to determine the hyperplane and make classifications. If a column has a highly variable range of numbers, it could dominate the calculations; scaling helps all balance the features’ contributions.

set.seed(123)

bank_train3 <- na.roughfix(bank_train)
bank_test3 <- na.roughfix(bank_test)

num_cols <- sapply(bank_train, is.numeric)

bank_svm1 <- svm(y ~.,
                 data = bank_train,
                 scale = num_cols,
                 kernel = "linear")

summary(bank_svm1)

## 
## Call:
## svm(formula = y ~ ., data = bank_train, kernel = "linear", scale = num_cols)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
## 
## Number of Support Vectors:  1894
## 
##  ( 953 941 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  no yes

# predict and evaluate on training data
svm1_train_pred <- predict(bank_svm1, bank_train3)
table(predict = svm1_train_pred, truth = bank_train3$y)

##        truth
## predict    no   yes
##     no  31423  3282
##     yes   515   950

svm1_train_cm <- confusionMatrix(svm1_train_pred, bank_train3$y)
svm1_train_cm$overall["Accuracy"]

##  Accuracy 
## 0.8950235

# predict and evaluate on testing data
svm1_test_pred <- predict(bank_svm1, bank_test3)
table(predict = svm1_test_pred, truth = bank_test3$y)

##        truth
## predict   no  yes
##     no  7862  830
##     yes  122  227

svm1_test_cm <- confusionMatrix(svm1_test_pred, bank_test3$y)
svm1_test_cm$overall["Accuracy"]

##  Accuracy 
## 0.8947019

With this model, there are 32373 correct classifications and 3797 errors on the training data. On the testing set, there are 952 errors against 8089 correct classifications. So the accuracy values in the confusion matrix are about the same as with previous experiments using decision trees.

svm1_log <- data.frame(
  ID = 7,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "cost = 1",
  Train = 0.90,
  Test = 0.89,
  Notes = "same accuracy as decision tree experiments"
)

experiment_log <- bind_rows(experiment_log, svm1_log)

Experiment 8:

Objective: To see if we can improve on this, 10-fold cross-validation will be applied with different, commonly-used cost values to determine the best-performing model for training and testing.

Variations: Based on hyperparameter tuning, the cost will be either 0.01, 0.1, 1 (same), or 10.

Evaluation: The same table and accuracy will be generated.

Experiment:

tune_mod <- tune(svm,
                 y ~.,
                 data = bank_train,
                 kernel = "linear",
                 ranges = list(cost = c(0.01, 0.1, 1, 10)))

summary(tune_mod)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##  0.01
## 
## - best performance: 0.1775158 
## 
## - Detailed performance results:
##    cost     error  dispersion
## 1  0.01 0.1775158 0.006975374
## 2  0.10 0.1808359 0.005952814
## 3  1.00 0.1816526 0.005950147
## 4 10.00 0.1810692 0.006229219

The error rates do not vary, but the best model was determined to have a cost value of 0.01.

best_mod <- tune_mod$best.model

# predict and evaluate on training data
best_train_pred <- predict(best_mod, bank_train3)
table(predict = best_train_pred, truth = bank_train3$y)

##        truth
## predict    no   yes
##     no  31354  3169
##     yes   584  1063

best_train_cm <- confusionMatrix(best_train_pred, bank_train3$y)
best_train_cm$overall["Accuracy"]

## Accuracy 
##  0.89624

# predict and evaluate on testing data
best_test_pred <- predict(best_mod, bank_test3)
table(predict = best_test_pred, truth = bank_test3$y)

##        truth
## predict   no  yes
##     no  7841  808
##     yes  143  249

best_test_cm <- confusionMatrix(best_test_pred, bank_test3$y)
best_test_cm$overall["Accuracy"]

##  Accuracy 
## 0.8948125

In this case, the number of errors on training was 3753, a small drop of 44. On testing, there 951 errors; the overall improvement was minuscule.

svm2_log <- data.frame(
  ID = 8,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "tuned to best cost = 0.01",
  Train = 0.90,
  Test = 0.89,
  Notes = "no real improvement"
)

experiment_log <- bind_rows(experiment_log, svm2_log)

Experiment 9:

Objective: To see if changing the model from linear to the Radial Basis Function (RBF), a common non-linear kernel, will affect performance.

Variations: The kernel will the changed; and in the case of non-linear kernels, the gamma hyperparameter will be taken into account to determine how influential individual points are on the hyperplane, or basically how smooth/sensitive the decision boundary will be. The default value for svm is 1/(data dimension).

Evaluation: The same table and accuracy will be generated.

Experiment:

set.seed(123)

num_cols <- sapply(bank_train, is.numeric)

bank_svm3 <- svm(y ~.,
                 data = bank_train,
                 scale = num_cols,
                 cost = 0.1,
                 kernel = "radial")

summary(bank_svm3)

## 
## Call:
## svm(formula = y ~ ., data = bank_train, cost = 0.1, kernel = "radial", 
##     scale = num_cols)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.1 
## 
## Number of Support Vectors:  2501
## 
##  ( 1257 1244 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  no yes

# predict and evaluate on training data
svm3_train_pred <- predict(bank_svm3, bank_train3)
table(predict = svm3_train_pred, truth = bank_train3$y)

##        truth
## predict    no   yes
##     no  31186  2860
##     yes   752  1372

svm3_train_cm <- confusionMatrix(svm3_train_pred, bank_train3$y)
svm3_train_cm$overall["Accuracy"]

##  Accuracy 
## 0.9001382

# predict and evaluate on testing data
svm3_test_pred <- predict(bank_svm3, bank_test3)
table(predict = svm3_test_pred, truth = bank_test3$y)

##        truth
## predict   no  yes
##     no  7801  732
##     yes  183  325

svm3_test_cm <- confusionMatrix(svm3_test_pred, bank_test3$y)
svm3_test_cm$overall["Accuracy"]

##  Accuracy 
## 0.8987944

Again, there is minimal change to the errors and accuracy.

svm3_log <- data.frame(
  ID = 9,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "RBF kernel, cost = 0.1, gamma = 0.024",
  Train = 0.90,
  Test = 0.90,
  Notes = "no real improvement"
)

experiment_log <- bind_rows(experiment_log, svm3_log)

Experiment 10:

Objective: To perform cross-validation on models with different, commonly-used gamma values.

Variations: Based on this hyperparameter tuning, the gamma will be either 0.001, 0.024 (same), 0.1, or 1.

Evaluation: The same table and accuracy will be generated.

Experiment:

RBF_tune_mod <- tune(svm,
                     y ~.,
                     data = bank_train,
                     kernel = "radial",
                     ranges = list(gamma = c(0.001, 0.024, 0.1, 1)))

summary(RBF_tune_mod)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma
##    0.1
## 
## - best performance: 0.1607619 
## 
## - Detailed performance results:
##   gamma     error  dispersion
## 1 0.001 0.2163326 0.015528251
## 2 0.024 0.1726482 0.009425219
## 3 0.100 0.1607619 0.011262381
## 4 1.000 0.2410393 0.021859604

A gamma of 0.1 was determined to give the best performance, with an error rate of 0.16; but this is not far off from the previous model.

best_rbf_mod <- RBF_tune_mod$best.model

# predict and evaluate on training data
brbf_train_pred <- predict(best_rbf_mod, bank_train3)
table(predict = brbf_train_pred, truth = bank_train3$y)

##        truth
## predict    no   yes
##     no  31318  2883
##     yes   620  1349

brbf_train_cm <- confusionMatrix(brbf_train_pred, bank_train3$y)
brbf_train_cm$overall["Accuracy"]

##  Accuracy 
## 0.9031518

# predict and evaluate on testing data
brbf_test_pred <- predict(best_rbf_mod, bank_test3)
table(predict = brbf_test_pred, truth = bank_test3$y)

##        truth
## predict   no  yes
##     no  7809  763
##     yes  175  294

brbf_test_cm <- confusionMatrix(brbf_test_pred, bank_test3$y)
brbf_test_cm$overall["Accuracy"]

##  Accuracy 
## 0.8962504

Once again, hyperparameter tuning appeared to have little effect on the performance.

svm4_log <- data.frame(
  ID = 10,
  Model = "SVM",
  Features = "all",
  Hyperparameters = "tuned to best gamma = 0.1",
  Train = 0.90,
  Test = 0.90,
  Notes = "no real improvement"
)

experiment_log <- bind_rows(experiment_log, svm4_log)

Support Vector Machines

DATA 622 Assignment 3

Stephanie Chiang

Spring 2025

SVM Experiment

Experiment 7:

Experiment 8:

Experiment 9:

Experiment 10:

SVM Comparison

Conclusion

ID	Model	Features	Hyperparameters	Train	Test	Notes
1	Decision Tree	duration, poutcome, pdays	none	0.90	0.89	marketing features only
2	Decision Tree	poutcome, pdays	none	0.89	0.89	dropped duration, minimal changes
3	Random Forest	all, with different ranking order from decision trees after ‘duration’	impute method, number of trees	1.00	0.85	overfitting
4	Random Forest	ranked ‘duration’, ‘month’, and ‘poutcome’	leaf size, number of features randomly sampled	0.85	0.81	less accurate, lowered variance
5	XGBoost	all	nrounds = 100, defaults	0.96	0.91	duration ranked first
6	XGBoost	all	k-fold cross-validation, gamma, minimum child weight, nrounds = 55	0.94	0.91	boosting rounds reduced significanty, similar accuracy
7	SVM	all	cost = 1	0.90	0.89	same accuracy as decision tree experiments
8	SVM	all	tuned to best cost = 0.01	0.90	0.89	no real improvement
9	SVM	all	RBF kernel, cost = 0.1, gamma = 0.024	0.90	0.90	no real improvement
10	SVM	all	tuned to best gamma = 0.1	0.90	0.90	no real improvement