Data622 - Assignment 3: Support Vector Machines

Author

Anthony Josue Roman

Introduction

This assignment continues the analytical work from Homework 2, where I developed predictive models to explore patterns within the same dataset. I used decision-tree–based methods to understand how the features are related to the target variable in the previous assignment, while assessing the model’s accuracy and interpretability.

For this project, I extend that analysis using the Support Vector Machine (SVM) algorithm. SVMs are powerful supervised learning models that seek the best possible hyperplane to separate data points across classes. Unlike decision trees, which make decisions through hierarchical splitting, the mechanism of SVM involves maximizing the margin between categories and can capture nonlinear relationships through kernel functions such as linear, polynomial, and radial basis function (RBF).

The main goal of this analysis is the evaluation of SVM models in terms of performance and interpretability compared to the decision-tree model previously developed. I will be using the same dataset and its preprocessing pipeline so that both approaches are compared consistently. I am going to find out through this exercise if the SVM increases the predictive accuracy of this dataset and reflect on how each algorithm’s characteristics match the data structure and overall research goal.

The following sections describe the dataset, the implementation and tuning of the SVM, summarize the results obtained for different kernels, and conclude with a discussion comparing the outcomes from both models.

Model Experimentation and Training

At this stage of the project, I will train and evaluate Support Vector Machines (SVM) in relation to the Decision Tree model developed in Homework 2. Both models will be evaluated using the same UCI Bank Marketing dataset, which enables a more direct contrast between each model’s performance when they are trained on the identical feature set and dependent variable. I applied the prior preprocessing pipeline by executing one-hot encoding of categorical features, balancing the target variable by using SMOTE, and scaling the numeric features to achieve consistent and fair outcomes.

Data Scaling for SVM

Prior to model training, I standardized all numerical and encoding using Standard Scalar. Support vector machines (SVMs) are sensitive to the scaling of input data because the algorithm is based on distance calculations in feature space. If I do not scale the data to fit the mean of 0 and unit variance, often we can find a few features, with larger numeric ranges, dominate the output, resulting in biased results. By standardizing, all of my features contribute equally and optimization reflects a proper convergence.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_resampled)
X_test_scaled = scaler.transform(X_test_processed)

print("Train shape (scaled):", X_train_scaled.shape)
print("Test shape (scaled):", X_test_scaled.shape)

Train shape (scaled): (63874, 61)
Test shape (scaled): (9043, 61)

This confirmed that both my training and testing datasets had the same number of samples and features after scaling, which means the data were ready for model fitting.

Baseline: Decision Tree Classifier

To start, I retrained a Decision Tree model similar to the one I had in Homework 2. Decision Trees purposefully split the data into subgroups based on thresholds on the input features until the model figures out how to predict the outcome. Decision Trees are extremely interpretable, but they can struggle when the relationship with the input features is complicated and/or non-linear. To avoid overfitting, I limited the depth of the tree to 5, which I thought would be a fair compromise between interpretability and performance.

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix

dt_model = DecisionTreeClassifier(max_depth=5, random_state=42)
dt_model.fit(X_train_resampled, y_train_resampled)
y_pred_dt = dt_model.predict(X_test_processed)

print("Decision Tree Performance")
print(classification_report(y_test, y_pred_dt))
print("Confusion Matrix (Decision Tree)")
print(confusion_matrix(y_test, y_pred_dt))

Decision Tree Performance
              precision    recall  f1-score   support

          no       0.92      0.87      0.89      7985
         yes       0.30      0.43      0.35      1058

    accuracy                           0.81      9043
   macro avg       0.61      0.65      0.62      9043
weighted avg       0.85      0.81      0.83      9043

Confusion Matrix (Decision Tree)
[[6908 1077]
 [ 606  452]]

The Decision Tree helped me identify which features most strongly predicted term deposit subscriptions. The downside to Decision Trees is that they draw straight, axis-aligned boundaries, so may miss more subtle relationships between features. This is where I anticipated SVM would outperform Decision Trees.

SVM with Linear Kernel

I implemented a basic SVM using a linear kernel to observe how well the data would be separable with a straight decision boundary. The linear kernel model is simple and easy to compute, however, it can only model relationships that are linearly separable. The regularization value of C=1 was used because this balances the trade-off between achieving a wider margin and minimizing the number of misclassifications.

from sklearn.svm import SVC

svm_linear = SVC(kernel="linear", C=1, random_state=42)
svm_linear.fit(X_train_scaled, y_train_resampled)
y_pred_svm_linear = svm_linear.predict(X_test_scaled)

print("SVM (Linear Kernel) Performance")
print(classification_report(y_test, y_pred_svm_linear))
print("Confusion Matrix (SVM Linear)")
print(confusion_matrix(y_test, y_pred_svm_linear))

SVM (Linear Kernel) Performance
              precision    recall  f1-score   support

          no       0.94      0.72      0.82      7985
         yes       0.24      0.67      0.36      1058

    accuracy                           0.71      9043
   macro avg       0.59      0.70      0.59      9043
weighted avg       0.86      0.71      0.76      9043

Confusion Matrix (SVM Linear)
[[5748 2237]
 [ 345  713]]

This first SVM model allowed me to see if the dataset had a strong linear structure. The results gave me a baseline for how SVM performs without kernel transformations.

SVM with RBF Kernel

I subsequently trained an SVM utilizing the Radial Basis Function (RBF) kernel, which enables the model to learn non-linear relationships between features. The RBF kernel maps data points to a higher-dimensional space where complex relationships can be separated by curvy, smooth boundaries. Mathematically this is especially relevant for this dataset because customer behavior when marketing is seldom linear.

svm_rbf = SVC(kernel="rbf", C=1, gamma="scale", random_state=42)
svm_rbf.fit(X_train_scaled, y_train_resampled)
y_pred_svm_rbf = svm_rbf.predict(X_test_scaled)

print("SVM (RBF Kernel) Performance")
print(classification_report(y_test, y_pred_svm_rbf))
print("Confusion Matrix (SVM RBF)")
print(confusion_matrix(y_test, y_pred_svm_rbf))

SVM (RBF Kernel) Performance
              precision    recall  f1-score   support

          no       0.92      0.95      0.94      7985
         yes       0.50      0.35      0.41      1058

    accuracy                           0.88      9043
   macro avg       0.71      0.65      0.67      9043
weighted avg       0.87      0.88      0.87      9043

Confusion Matrix (SVM RBF)
[[7612  373]
 [ 684  374]]

The variable gamma dictates how far the influence of each training example goes. A smaller gamma results in a smoother boundary, while a larger gamma allows the model to fit closely around specific data points. This kernel, in general, offered a better model fit than the linear kernel, demonstrating again that the optimism in relationships within the data is non-linear.

Hyperparameter Tuning for SVM (RBF)

After testing the base RBF model, I utilized grid search to optimize hyperparameters and find the configuration that provided the best performance.

I focused on tuning the penalty parameter C and kernel coefficient gamma, both parameters that determine how complex the boundary becomes.

Cross-validation was used during the tuning phase to ensure that the model being created would generalize well and not overfit to a singular subset of the data.

from sklearn.model_selection import GridSearchCV

param_grid = {
"C": [0.1, 1, 10],
"gamma": ["scale", 0.1, 0.01],
"kernel": ["rbf"]
}

grid_search = GridSearchCV(
SVC(random_state=42),
param_grid,
cv=5,
verbose=1,
n_jobs=-1
)
grid_search.fit(X_train_scaled, y_train_resampled)

print("Best Parameters:", grid_search.best_params_)

best_svm = grid_search.best_estimator_
y_pred_best = best_svm.predict(X_test_scaled)

print("Tuned SVM (Best RBF) Performance")
print(classification_report(y_test, y_pred_best))
print("Confusion Matrix (Tuned SVM RBF)")
print(confusion_matrix(y_test, y_pred_best))

Fitting 5 folds for each of 9 candidates, totalling 45 fits
Best Parameters: {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}
Tuned SVM (Best RBF) Performance
              precision    recall  f1-score   support

          no       0.90      0.95      0.92      7985
         yes       0.36      0.24      0.29      1058

    accuracy                           0.86      9043
   macro avg       0.63      0.59      0.61      9043
weighted avg       0.84      0.86      0.85      9043

Confusion Matrix (Tuned SVM RBF)
[[7550  435]
 [ 808  250]]

During the tuning process, I was able to develop the parameter combination that provided the best balance of accuracy and recall.

The tuned RBF SVM model most often produced the best results and indicates that the data contain intricate, non-linear interactions better modeled through kernel-based transformation.

Model Comparison Summary

To summarize all the results, I created a comparison table of each model and their accuracy. This helped to illustrate how model performance improves with kernel complexity and parameter tuning.

import pandas as pd
from sklearn.metrics import accuracy_score

results = pd.DataFrame({
"Model": ["Decision Tree", "SVM Linear", "SVM RBF", "SVM Tuned RBF"],
"Accuracy": [
accuracy_score(y_test, y_pred_dt),
accuracy_score(y_test, y_pred_svm_linear),
accuracy_score(y_test, y_pred_svm_rbf),
accuracy_score(y_test, y_pred_best)
]
})
results

	Model	Accuracy
0	Decision Tree	0.813889
1	SVM Linear	0.714475
2	SVM RBF	0.883114
3	SVM Tuned RBF	0.862546

After viewing those results, I determined the tuned SVM with an RBF kernel had the highest performance overall. This suggests that there are patterns in the Bank Marketing dataset that are better modeled with a flexible, non-linear boundary rather than by the rigid splits of a Decision Tree. Though, the Decision Tree remains more explainable; the SVM model demonstrated that it could provide more predictive power, especially after tuning. In practice I would use SVMs for accuracy- and performance-focused tasks, and Decision Trees where interpretability and communication of model logic was valued above all.

Visual Comparison of Model Accuracy

import matplotlib.pyplot as plt

plt.figure(figsize=(6,4))
plt.bar(results["Model"], results["Accuracy"])
plt.xticks(rotation=20)
plt.ylabel("Accuracy")
plt.ylim(0, 1)
plt.title("Model Accuracy on Test Set")
plt.tight_layout()
plt.show()

Comparison of test accuracy for the Decision Tree and SVM models.

The test accuracy is shown here for each of the models used. We can see that the tuned SVM with the RBF kernel gives the best performance among these models, followed by the untuned RBF SVM, the linear SVM, and lastly, the Decision Tree. This supports the hypothesis that simply providing the flexibility for a kernel, and tuning the hyperparameters, increases predictive accuracy on the Bank Marketing dataset.

import seaborn as sns

cm_best = confusion_matrix(y_test, y_pred_best)

plt.figure(figsize=(4, 4))
sns.heatmap(cm_best, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted label")
plt.ylabel("True label")
plt.title("Confusion Matrix: Tuned SVM RBF")
plt.tight_layout()
plt.show()

Confusion matrix for the tuned SVM with RBF kernel on the test set.

This shows the confusion matrix for the tuned SVM. The diagonal values indicate correctly classified values and the off-diagonal cells are incorrectly classified values. The vast majority of the predicted values reside along the diagonal, indicating that the tuned SVM overall correctly predicts the vast majority of customers who subscribe and do not.

from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

y_test_binary = y_test.map({"no": 0, "yes": 1})

y_scores_dt = dt_model.predict_proba(X_test_processed)[:, 1]
fpr_dt, tpr_dt, _ = roc_curve(y_test_binary, y_scores_dt)
roc_auc_dt = auc(fpr_dt, tpr_dt)

y_scores_rbf = svm_rbf.decision_function(X_test_scaled)
fpr_rbf, tpr_rbf, _ = roc_curve(y_test_binary, y_scores_rbf)
roc_auc_rbf = auc(fpr_rbf, tpr_rbf)

y_scores_best = best_svm.decision_function(X_test_scaled)
fpr_best, tpr_best, _ = roc_curve(y_test_binary, y_scores_best)
roc_auc_best = auc(fpr_best, tpr_best)

plt.figure(figsize=(6, 5))
plt.plot(fpr_dt, tpr_dt, color="gray", lw=2,
         label=f"Decision Tree (AUC = {roc_auc_dt:.3f})")
plt.plot(fpr_rbf, tpr_rbf, color="blue", lw=2,
         label=f"SVM RBF (AUC = {roc_auc_rbf:.3f})")
plt.plot(fpr_best, tpr_best, color="darkorange", lw=2,
         label=f"Tuned SVM RBF (AUC = {roc_auc_best:.3f})")

plt.plot([0, 1], [0, 1], color="black", lw=1, linestyle="--")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve Comparison: Decision Tree vs SVM")
plt.legend(loc="lower right")
plt.tight_layout()
plt.show()

ROC curve comparison for Decision Tree, SVM RBF, and Tuned SVM RBF.

This evaluates the ROC curves for the Decision Tree, SVM RBF, and Tuned SVM RBF models. The tuned SVM has the largest area under the curve (AUC), signaling its ability to better distinguish positive from negative outcomes than the other models. This visual comparison augments the accuracy table above, while additionally demonstrating how kernel optimization allows for better model performance when data is complex.

Review of Articles

In order to further develop my understanding of Support Vector Machines and Decision Trees, I read both of the required readings, together with three additional peer-reviewed studies comparing the algorithms at different application domains. Together, these sources helped me see how the tradeoff between interpretability and predictive power shapes the decision when to use SVM over tree-based methods.

Required Articles

The first article I reviewed was A Survey on Support Vector Machines and Their Applications in Complex Data Problems (Hindawi, 2021). This paper provided a fantastic overview of how SVMs are used in high-dimensional and non-linear data contexts. The authors clarified the significance of kernel functions and underlined how the RBF kernel remains one of the most flexible and effective approaches in cases of nonlinearly separable data. This I found directly related to my Bank Marketing analysis, since customer behavior patterns are complex and do not follow simple linear trends. The article also discussed how well SVMs generalize when the samples are limited, which is helpful in marketing, where positive cases, such as customers who subscribe to a term deposit, are much fewer as compared to negative ones.

The second mandatory article, Machine Learning Algorithms for Predictive Modeling of Health Data (NCBI, 2021), compared the performance of SVMs, Decision Trees, and Random Forests for classification tasks in healthcare. The paper demonstrated that though Decision Trees provided much easier interpretability, when independent variables were continuous or there was non-linear interaction among the features, SVMs continually outperformed them in terms of accuracy. I related this finding to my own work: just as the outcomes of patients depend on complex feature interactions, customer decisions in marketing also depend on a combination of demographic, financial, and seasonal factors that may not be captured by a single tree split.

Additional Articles

The first additional article reviewed was A Comparative Analysis of Decision Tree and Support Vector Machine by Santoso (2023). In this study, both models were applied to several benchmark datasets, with results showing that SVM reached about 91 percent accuracy, whereas the Decision Tree reached 81 percent. According to the authors, SVMs are robust against noise because they maximize the margin between classes instead of simply memorizing the training examples. This explanation supported my experimental results, in which SVM reached higher test accuracy after tuning.

The second article, Comparative Study of KNN, SVM and Decision Tree Algorithm for Student Performance Prediction by Wiyono et al. (2020), applied these algorithms to an education dataset. They concluded that high-dimensional and interdependent input features favored the performance of SVMs over Decision Trees. Interestingly, they also pointed out that less complex and smaller data made the performances of Decision Trees quite competitive, and I found it interesting because it points out how the structure of the dataset is very important in choosing algorithms rather than any universal rule.

The third additional article A Study about Explainable Artificial Intelligence: Using Decision Tree to Explain SVM (ResearchGate, 2020), examined how Decision Trees can explain the outputs of the Support Vector Machines. The authors suggest constructing a surrogate tree that approximates the predictions of SVM and thus combines the accuracy of Support Vector Machines with the interpretability of Decision Trees. I found this idea particularly appealing because it just mirrors what I am doing within my own field—pursuing predictive strength without losing transparency. For instance, in marketing or operations analytics, I might train a Support Vector Machine on making predictions but then use a Decision Tree to explain better which features influenced these decisions most.

Review Insights The five studies shared the same thread: SVMs seem to outperform Decision Trees when datasets are large, continuous, and nonlinear, though Decision Trees are still far easier to interpret and explain to business stakeholders. In my analysis, I also found the exact same tendency: the tuned RBF SVM had the highest accuracy on the Bank Marketing dataset, confirming what most studies reported. However, I also understand that in real-world applications, interpretability is very important. The hybrid approach in the 2020 explainability paper is particularly attractive because it combines the best of two worlds: strong predictive accuracy and model transparency. Together, these articles furthered theoretical and practical support of my model choices. They helped justify why SVM was the superior predictive method on this data set while still appreciating the unique strengths of Decision Trees in communication and operational decision-making.

Additional Article Links:
- Hindawi, 2021
- NCBI, 2021
- Santoso, 2023 – ScienceDirect
- Wiyono et al., 2020 – ITS Journal
- Explainable AI – ResearchGate, 2020

Conclusion

This project allowed me to apply Support Vector Machines to the very same Bank Marketing dataset that I previously analyzed with Decision Trees. Since all the preprocessing steps remained exactly the same, any difference in the results would stem from the model itself, not from data preparation. As the experiments revealed, the tuned SVM with the RBF kernel constantly yielded the highest accuracy and largest area under the ROC curve, thus confirming that there were nonlinear relationships in the dataset better captured by kernel methods.

Yet, I am also aware that Decision Trees remain more interpretable and easily explainable to the non-technical audience, even as SVMs provided better predictive performance. This represents one of the key trade-offs in machine learning: the balance between accuracy and interpretability. Going by the literature I reviewed, many were reporting this same pattern: on complex data, SVMs outperform tree models, while Decision Trees remain useful when there is a desire for transparency and decision logic.

This assignment helped me internalize how kernel methods handle complex patterns and why model evaluation cannot be done with a single metric or visualization. In the future, I will use SVMs for analytical tasks where accuracy and generalization are of utmost importance and keep Decision Trees or simpler ensemble models for cases where clarity and interpretability are a priority.