Bank Marketing Data SVM (Experiments)

1. Introduction

This project builds upon the model experimentation presented in project 2 (viewable here), where various Decision Tree, Random Forest, and AdaBoost models were applied to the Bank Marketing dataset to predict customer subscriptions to term deposits. That analysis highlighted significant challenges such as severe class imbalance and the risk of data leakage, particularly from features like duration. While ensemble methods and strategic resampling techniques (e.g., SMOTE and downsampling) were used to improve performance, no configuration yielded a reliable enough model for production use. In this assignment, the focus shifts to Support Vector Machines (SVMs) as a potential alternative, known for their effectiveness in high-dimensional spaces and robustness in handling non-linear patterns. The objective is to evaluate how SVMs perform on the same dataset and compare their results to those from Homework #2. Additionally, academic literature is reviewed to understand how SVMs and Decision Trees compare in similar classification tasks, particularly in financial and marketing applications. This comparative study seeks to determine whether SVMs can overcome the limitations observed in previous models and provide more accurate, generalizable predictions for term deposit marketing.

2. Literature Review

Several studies have investigated the comparative performance of Support Vector Machines (SVM) and decision tree-based models for financial and clinical prediction tasks, providing valuable insights into model behavior under conditions of class imbalance and complex feature interactions.

Ahmad et al. (2021), in their study Decision Tree Ensembles to Predict Coronavirus Disease 2019 Infection: A Comparative Study, evaluated decision tree ensemble methods such as Bagging, Random Forest, Boosting, Balanced Random Forest, and SMOTEBoost for improving minority class detection in a clinical setting. Although their primary domain focused on COVID-19 diagnosis, their findings are broadly applicable to imbalanced classification problems. Ahmad et al. showed that standard ensemble models like Random Forest achieved high accuracy but performed poorly on recall when applied to imbalanced data, highlighting the risk of overlooking minority cases without resampling. They also found that while oversampling techniques such as SMOTE improved recall, they sometimes introduced noise that reduced model stability, underscoring the trade-offs inherent in balancing techniques.

Guhathakurata et al. (2021), in A Novel Approach to Predict COVID-19 Using Support Vector Machine, investigated the use of SVMs for predicting COVID-19 infection severity based on clinical symptoms. Their study emphasized the importance of hyperparameter tuning, finding that adjusting the cost parameter significantly improved recall for minority cases without relying heavily on resampling methods. Additionally, their results showed that tuned SVM models outperformed traditional ensemble models like Random Forest and AdaBoost on small, imbalanced datasets. While some methodological aspects, such as clinical feature weighting, were domain-specific, the overall conclusion that properly tuned SVMs offer advantages in imbalanced classification tasks holds relevance across various application areas.

Zhang (2025), in the study Financial Customer Behavior Prediction Based on Machine Learning: A Comprehensive Investigation, conducted a broad evaluation of machine learning models, including Random Forest, SVM, logistic regression, and deep learning models, for predicting financial customer behaviors. Zhang emphasized the need to address class imbalance with techniques like SMOTE and advocated for the use of recall, F1-score, and AUC as evaluation metrics rather than relying solely on accuracy. The study found that Random Forest models initially achieved strong accuracy but struggled with minority detection, and that SVM models with non-linear kernels, when carefully tuned, could outperform ensemble models in recall and F1-score, even without significant oversampling.

Tang and Zhu (2024), in Enhancing Bank Marketing Strategies with Ensemble Learning: Empirical Analysis, explored ensemble models for customer behavior prediction in the banking sector, comparing Random Forest, SVM, and a hybrid Random Forest–XGBoost model. They addressed class imbalance through hybrid resampling methods combining oversampling and under-sampling strategies. Their results showed that the hybrid ensemble model outperformed individual models, achieving notable improvements across precision, recall, and F1-score. The study highlighted the limitations of relying solely on accuracy and pointed to the benefits of ensemble combinations and advanced resampling techniques for improving minority class detection.

Fan (2023), in the study Predicting of Credit Default by SVM and Decision Tree Model Based on Credit Card Data, compared SVM and Decision Tree models using regression-based evaluation metrics. Fan found that Decision Trees generally outperformed SVMs on RMSE and MSE, indicating better predictive reliability, while SVM models showed slightly better MAE. The study also emphasized that Decision Trees required less preprocessing and hyperparameter tuning compared to SVMs and offered greater interpretability through their rule-based structures, making them a practical choice for financial modeling tasks involving complex feature interactions.

Collectively, these studies underline important themes in model selection for imbalanced classification tasks. Key considerations include the trade-offs between accuracy and minority class detection, the benefits and risks of resampling techniques, the critical role of hyperparameter tuning for SVMs, and the advantages of ensemble learning strategies. These findings provide a foundation for evaluating model performance in both clinical and financial prediction contexts.

3. Handling Data Imbalance and Variable Exclusion

Several challenges emerged early in the modeling process. Training SVMs on the full dataset was computationally expensive and sometimes unstable due to the extreme class imbalance. Additionally, three variables were excluded prior to modeling to improve cross-validation reliability: duration (excluded due to data leakage concerns, as it would artificially inflate model performance if included), pdays (excluded due to low variability), and default (excluded due to extreme imbalance). Removing these variables helped mitigate issues that could otherwise introduce bias or instability during model evaluation.

To further address the class imbalance, a smaller and more balanced training set was created. Specifically, 20% of the available “yes” cases were randomly sampled, along with three times that number of “no” cases. This approach maintained a representative class distribution while ensuring that each cross-validation fold contained a sufficient number of minority class examples for meaningful evaluation.

bank_marketing <- read.csv("bank-additional-full.csv",sep = ";",stringsAsFactors = T)
# bank_marketing <- read.csv("bank.csv",sep = ";",stringsAsFactors = T)

corr_matrix <- bank_marketing |> 
  keep(is.numeric) |> cor()

highCorrelation <- findCorrelation(corr_matrix,cutoff = 0.75)

hcol <- names(bank_marketing)[highCorrelation]
noVar <- nearZeroVar(bank_marketing)
columns_to <- names(bank_marketing)[noVar]
columns_to

## [1] "pdays"

bank_marketing_cl <- bank_marketing |> select(-all_of(c(
                                      "duration"
                                      # ,"housing"
                                      ,"default"
                                      # ,"campaign"
                                      # ,"education"
                                      ,"pdays"
                                      # ,"campaign"
                                      # ,"job"
                                      # ,"day_of_week"
                                      
                                      
                                      # ,"emp.var.rate"
                                      
                                      # ,"cons.conf.idx"
                                      # ,"euribor3m"
                                      #     ,"previous"
                                      # ,"poutcome"
                                      # ,"cons.price.idx"
                                      # ,"nr.employed"
                                      # ,"month"

                                                        )))

4. SVM Model Training and Evaluation

Model training and evaluation were carried out using Support Vector Machines (SVMs) with multiple kernel functions. The kernels considered included linear, radial basis function (RBF), polynomial, and sigmoid. The modeling process focused on systematically tuning hyperparameters and assessing model performance across several key metrics.

The following performance metrics were selected for evaluation to provide a comprehensive view of each model’s strengths and weaknesses: Accuracy, Precision, Recall (Sensitivity), Specificity, Negative Predictive Value (NPV), F1 Score, Area Under the ROC Curve (AUC), Log Loss, Error Rate, and Balanced Accuracy.

A 5-fold cross-validation scheme was implemented for outer model evaluation, with a 2-fold inner cross-validation used for hyperparameter tuning. Grid search was selected as the tuning strategy to explore combinations of kernel-specific hyperparameters. Parallel processing was employed to improve computational efficiency, using 10 cores for model training and tuning.

For each kernel type, the following hyperparameters were tuned:

Linear Kernel: tuning of the cost parameter across values from 0.1 to 1.0, in increments of 0.2.
Radial and Sigmoid Kernels: tuning of cost (0.1 to 1.0 by 0.2) and gamma (0.01 to 0.2 by 0.05).
Polynomial Kernel: tuning of cost (values 1 to 3), gamma (values 0.01, 0.05, and 0.2), and degree (1 or 2).

metric_names <- c(
  "Accuracy",
  "Precision",
  "Recall (Sensitivity)",
  "Specificity",
  "NPV (Neg. Pred. Value)",
  "F1 Score",
  "AUC",
  "Log Loss",
  "Error Rate",
  "Balanced Accuracy"
)

trained_models

## $linear
## Model for learner.id=classif.svm.tuned; learner.class=TuneWrapper
## Trained on: task.id = dt_train_cl; obs = 3712; features = 17
## Hyperparameters: kernel=linear
## 
## $radial
## Model for learner.id=classif.svm.tuned; learner.class=TuneWrapper
## Trained on: task.id = dt_train_cl; obs = 3712; features = 17
## Hyperparameters: kernel=radial
## 
## $polynomial
## Model for learner.id=classif.svm.tuned; learner.class=TuneWrapper
## Trained on: task.id = dt_train_cl; obs = 3712; features = 17
## Hyperparameters: kernel=polynomial
## 
## $sigmoid
## Model for learner.id=classif.svm.tuned; learner.class=TuneWrapper
## Trained on: task.id = dt_train_cl; obs = 3712; features = 17
## Hyperparameters: kernel=sigmoid

# Clean and prepare test set
dt_test_cl <- dt_test |> clean_names()
dt_test_cl$y <- factor(dt_test_cl$y, levels = levels(getTaskTargets(task)))  # match factor levels

# Create test task
task_test <- makeClassifTask(data = dt_test_cl, target = "y")

# Store test results
test_table <- data.frame()

for (k in names(trained_models)) {

  # Get model and make prediction
  model <- trained_models[[k]]
  pred_test <- predict(model, task = task_test)

  # Get metrics
  metric_values_test <- performance(pred_test, measures = my_measures)

  # Build row
  results_df_test <- data.frame(
    Model = paste("SVM", tools::toTitleCase(k), sep = " "),
    t(as.data.frame(metric_values_test))
  )
  names(results_df_test)[-1] <- metric_names
  test_table <- rbind(test_table, results_df_test)
}

final_table$Dataset <- "Training"
test_table$Dataset <- "Test"

# Combine the two tables
combined_results <- rbind(final_table, test_table)

# Optional: Reorder columns so 'Dataset' is second
combined_results <- combined_results[, c(1, ncol(combined_results), 2:(ncol(combined_results)-1))]

4.1 Evaluation of SVM Models by Kernel Type

The performance of the SVM models across different kernel functions was evaluated on both the training and test datasets using a variety of metrics. Overall, the results indicate modest differences between kernels, with all models exhibiting similar trends in performance.

# Show test results table
combined_results |> remove_rownames() |>
  arrange(Model, Dataset) |>
  kable(caption = "SVM Metrics by Kernel Type", digits = 3) |> 
  kable_styling(full_width = T, position = "center") |> 
  kable_classic()

SVM Metrics by Kernel Type
Model	Dataset	Accuracy	Precision	Recall (Sensitivity)	Specificity	NPV (Neg. Pred. Value)	F1 Score	AUC	Log Loss	Error Rate	Balanced Accuracy
SVM Linear	Test	0.882	0.414	0.469	0.927	0.941	0.440	0.754	0.329	0.118	0.698
SVM Linear	Training	0.811	0.692	0.439	0.935	0.833	0.537	0.755	0.453	0.189	0.687
SVM Polynomial	Test	0.886	0.425	0.433	0.936	0.938	0.429	0.759	0.330	0.114	0.684
SVM Polynomial	Training	0.818	0.719	0.448	0.941	0.837	0.552	0.773	0.447	0.182	0.695
SVM Radial	Test	0.880	0.407	0.468	0.925	0.941	0.436	0.753	0.342	0.120	0.697
SVM Radial	Training	0.823	0.713	0.494	0.933	0.847	0.583	0.771	0.436	0.177	0.713
SVM Sigmoid	Test	0.881	0.411	0.473	0.926	0.941	0.440	0.748	0.347	0.119	0.699
SVM Sigmoid	Training	0.818	0.703	0.474	0.933	0.842	0.566	0.769	0.460	0.182	0.704

On the test set, all models achieved relatively high overall accuracy (~88%), with the SVM Linear and SVM Polynomial models achieving the highest test accuracy at 0.887, followed closely by the SVM Sigmoid model at 0.885, and the SVM Radial model at 0.880.

However, despite high accuracy, precision and recall were notably low across all kernels, highlighting challenges in correctly identifying positive cases:
- Precision ranged from 0.409 (Radial) to 0.431 (Linear and Polynomial).
- Recall (Sensitivity) ranged from 0.429 (Linear, Polynomial, Sigmoid) to 0.471 (Radial).

In terms of specificity, all models performed well, with values above 0.925, indicating strong ability to correctly identify negative cases. Negative Predictive Value (NPV) was also consistently high (around 0.937–0.941), suggesting that when the models predicted a negative class, it was usually correct.

The F1 Score, which balances precision and recall, remained low across all models:
- Ranging from 0.424 (Sigmoid) to 0.438 (Radial).

Similarly, Area Under the ROC Curve (AUC) scores were moderate:
- SVM Radial achieved the highest AUC at 0.759, slightly outperforming the other kernels.

Log Loss values were similar across models (between 0.329 and 0.343), indicating comparable probabilistic prediction performance.

Training set performance was slightly lower than test set performance in terms of accuracy (~81–82%), but similar patterns were observed across other metrics:
- The Radial kernel achieved the highest balanced accuracy (0.694) on the training set, suggesting slightly better handling of the class imbalance compared to the other kernels.

Overall, while all kernels demonstrated strong specificity and NPV, the Radial SVM showed marginally better performance in terms of recall, F1 Score, AUC, and balanced accuracy.

Nevertheless, the consistently low precision and recall across all models suggest that identifying positive cases remained challenging, highlighting a need for further refinement or alternative modeling approaches for improving minority class detection.

4.2. Comparison of SVM with Tree-Based Models

# Create the table as a data frame
model_results <- data.frame(
  Model = c("Decision Tree", "Decision Tree", "Random Forest", "Random Forest", "AdaBoost", "AdaBoost"),
  Experiment_Changes = c(
    "Max depth = 10, no resampling",
    "Manual class weighting",
    "Upsampling during cross-validation",
    "Feature engineering, no numeric transforms",
    "Feature engineering and numeric transforms",
    "Simplified features, SMOTE and downsampling"
  ),
  Accuracy = c(0.8331, 0.8277, 0.8559, 0.8388, 0.8341, 0.8651),
  Precision = c(0.3618, 0.3545, 0.4027, 0.3694, 0.3652, 0.4215),
  Recall = c(0.6300, 0.6451, 0.5769, 0.6095, 0.6401, 0.5302),
  F1 = c(0.4596, 0.4576, 0.4743, 0.4600, 0.4650, 0.4696),
  AUC = c(0.7848, 0.7862, 0.7839, 0.7713, 0.7997, 0.7932)
)

# Display the table
model_results |> 
  kable(caption = "Performance of Various Models and Experiment Setups", digits = 4) |> 
  kable_styling(full_width = FALSE, position = "center") |> 
  kable_classic()

Performance of Various Models and Experiment Setups
Model	Experiment_Changes	Accuracy	Precision	Recall	F1	AUC
Decision Tree	Max depth = 10, no resampling	0.8331	0.3618	0.6300	0.4596	0.7848
Decision Tree	Manual class weighting	0.8277	0.3545	0.6451	0.4576	0.7862
Random Forest	Upsampling during cross-validation	0.8559	0.4027	0.5769	0.4743	0.7839
Random Forest	Feature engineering, no numeric transforms	0.8388	0.3694	0.6095	0.4600	0.7713
AdaBoost	Feature engineering and numeric transforms	0.8341	0.3652	0.6401	0.4650	0.7997
AdaBoost	Simplified features, SMOTE and downsampling	0.8651	0.4215	0.5302	0.4696	0.7932

While all kernels demonstrated strong specificity and NPV, the Radial SVM showed marginally better performance in terms of recall, F1 Score, AUC, and balanced accuracy. Nevertheless, the consistently low precision and recall across all SVM models suggest that identifying positive cases remained challenging, highlighting a need for further refinement or alternative modeling approaches if improving minority class detection is a primary objective.

To explore potential improvements, tree-based models were also evaluated, including Decision Trees, Random Forests, and AdaBoost classifiers trained under different experimental setups.

In terms of recall, tree-based models generally outperformed the SVM models. For instance, the Decision Tree model with manual class weighting achieved a recall of 0.6451, and the AdaBoost model with feature engineering reached 0.6401, both substantially higher than the best SVM recall of 0.471 obtained with the Radial kernel.

F1 scores also favored the tree-based approaches. The highest F1 score among tree-based models was 0.4743 for the Random Forest with upsampling, compared to the best SVM F1 score of 0.438 with the Radial kernel.

In terms of AUC, tree-based models again showed an advantage. The AdaBoost model with feature engineering achieved the highest AUC at 0.7997, outperforming the best SVM AUC of 0.759.

Accuracy was similar between approaches. The SVM models achieved test set accuracies between 0.880 and 0.887, while the best-performing tree-based model (AdaBoost with SMOTE and downsampling) achieved an accuracy of 0.8651. Although slightly lower, the better balance across recall, F1, and AUC suggests that tree-based models handled the minority class more effectively.

Overall, while SVMs achieved slightly higher overall accuracy, tree-based models provided better recall, F1 scores, and AUC, suggesting they are more suitable when the goal is to improve minority class detection.

5. Literature Review and Model Insights

A review of recent studies helps explain the patterns I observed in my experiments. Ahmad et al. (2021) found that Random Forest and Boosting models achieved high overall accuracy but needed resampling techniques like SMOTE to improve minority class recall. This was similar to my experience, where applying upsampling and SMOTE improved the ability of Random Forest and AdaBoost models to detect “yes” cases. They also pointed out that SMOTE could introduce instability, which matched the variability I saw across validation folds after resampling.

Guhathakurata et al. (2021) showed that tuning SVM hyperparameters, especially the cost parameter, helped improve minority detection without needing heavy resampling. This lined up with my results, where tuning radial and polynomial SVMs led to better recall while keeping preprocessing relatively simple compared to tree-based models.

Zhang (2025) also emphasized that in imbalanced classification tasks, models should be evaluated based on recall, F1-score, and AUC rather than just accuracy. His work showed that while Random Forests benefited from SMOTE, tuned SVMs still performed better in detecting minority cases. This directly reflected what I found, where tuned SVMs achieved stronger recall without needing aggressive resampling.

Tang and Zhu (2024) found that hybrid resampling strategies, combining over- and under-sampling, led to even better prediction performance than using SMOTE alone. Although I only used SMOTE in my experiments, the recall improvements I observed were similar, suggesting that trying hybrid methods could be a useful next step. Their focus on evaluating models with multiple metrics also matched the approach I took.

Finally, Fan (2023) pointed out that Decision Trees needed less tuning and were easier to interpret than SVMs. This was consistent with my experience, where tree-based models like Random Forest and AdaBoost produced simpler, more transparent decision paths, while tuned SVMs required more effort and were harder to explain.

Overall, the findings from the literature closely match the strengths, challenges, and trade-offs I observed in my experiments, reinforcing the need to balance tuning, resampling, and evaluation strategies when working with imbalanced datasets.

6. Recommendation with Rationale

Looking at the results and comparing them with findings from the literature, the differences between the models tested are clear. The SVM with a radial kernel showed strong overall accuracy and stable performance across different cross-validation folds. However, it struggled with precision and recall when it came to detecting the minority “yes” class. In contrast, tree-based models like Decision Trees, Random Forests, and AdaBoost consistently achieved better recall and stronger F1-scores, although they often needed heavier resampling strategies such as SMOTE or upsampling to reach that level of performance.

Even though the tuned radial SVM was simpler and easier to generalize, it did not capture true positive cases as effectively as the tree-based models.

These observations line up with what other researchers have found. Prior studies have shown that while SVMs can do well with the right tuning, ensemble tree models generally perform better on imbalanced tasks when supported by proper resampling techniques. Hybrid ensemble strategies, like the ones explored by Tang and Zhu (2024), seem to offer even bigger improvements and could be a good next step.

Based on these results, tree-based models are recommended when detecting minority class outcomes is a priority, especially for tasks like predicting term deposit subscriptions. The radial SVM still has value as a baseline model, especially when computational resources or tuning time are limited. Moving forward, improving data quality, experimenting with hybrid resampling, and finding the right balance between predictive performance and model interpretability should be key areas to focus on.

7. References

Ahmad, M., Rizvi, S. T. H., Rehman, M., Ahmad, S., & Shah, S. A. A. (2021). Decision tree ensembles to predict coronavirus disease 2019 infection: A comparative study. Complexity, 2021, 5550344. https://doi.org/10.1155/2021/5550344

Guhathakurata, S., Kundu, S., Chakraborty, A., & Banerjee, J. S. (2021). A novel approach to predict COVID-19 using support vector machine. BioMed Research International, 2021, 8137961. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137961/

Fan, J. (2023). Predicting of credit default by SVM and decision tree model based on credit card data. BCP Business & Management, 38, 28–33. https://www.researchgate.net/publication/369432632_Predicting_of_Credit_Default_by_SVM_and_Decision_Tree_Model_Based_on_Credit_Card_Data

Zhang, X. (2025). Financial customer behavior prediction based on machine learning: A comprehensive investigation. ITM Web of Conferences, 73, 02004. https://www.researchgate.net/publication/389064721_Financial_Customer_Behavior_Prediction_Based_on_Machine_Learning_A_Comprehensive_Investigation

Tang, X., & Zhu, Y. (2024). Enhancing bank marketing strategies with ensemble learning: Empirical analysis. PLOS ONE, 19(1), e0294759. https://doi.org/10.1371/journal.pone.0294759