Bank Marketing Dashboard

Column

Instructions

Perform an analysis of the dataset(s) used in Homework #2 using the SVM algorithm. Compare the results with the results from previous homework.

Homework #3

Read the following articles:

https://www.hindawi.com/journals/complexity/2021/5550344/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137961/
Search for academic content (at least 3 articles) that compare the use of decision trees vs SVMs in your current area of expertise.
Perform an analysis of the dataset used in Homework #2 using the SVM algorithm.

Compare the results with the results from previous homework.

Answer questions, such as:

Which algorithm is recommended to get more accurate results?
Is it better for classification or regression scenarios?
Do you agree with the recommendations?
Why?

Introduction

In this assignment, we apply the Support Vector Machine (SVM) algorithm to the dataset used in our previous homework (Homework #2). The goal is two-fold: first, to explore the performance of SVM under different kernels and tuning parameters; and second, to compare those results with the findings from the earlier work, thereby determining whether SVM offers measurable improvements in our context.

Support Vector Machines represent a powerful class of supervised learning methods that are widely used for both classification and regression tasks. By maximizing the margin between classes (or performing regression via the so-called epsilon‐insensitive loss), SVM aim for strong generalization and robustness. Moreover, through the use of kernel functions they can flexibly handle non-linear relationships in the data.

Abstract

This assignment investigates the application of Support Vector Machines (SVM) to the dataset from Homework #2. The study involves training SVM models using multiple kernels and hyper‑parameter tuning, validating their performance, and comparing results with the algorithms used in the previous homework. Findings are analyzed in the context of existing literature comparing decision trees and SVM, with a focus on accuracy, robustness, and suitability for the domain. The analysis aims to determine whether SVM provides improved predictive performance and practical advantages over earlier methods.

Data Set

A Portuguese bank conducted a marketing campaign (phone calls) to predict if a client will subscribe to a term deposit The records of their efforts are available in the form of a dataset. The objective here is to apply machine learning techniques to analyze the dataset and figure out most effective tactics that will help the bank in next campaign to persuade more customers to subscribe to the bank’s term deposit. Download the Bank Marketing Dataset from: https://archive.ics.uci.edu/dataset/222/bank+marketing

Load libraries

I have done the same steps which I did in the previous homework to prepare the data for the modeling. In addition to the previous pre-processing steps, data scaling was added.

#A Portuguese bank conducted a marketing
# Read a CSV file
bank <- read.csv("bank.csv", sep = ";")

# Preview the first few rows of the dataset
#kable(head(bank, 10), caption = "Preview of the Bank Dataset")

# Replace "unknown" with NA
bank <- bank %>% mutate_all(~ifelse(. == "unknown", NA, .))

# Handle missing values 
for (col in names(bank)) {
  if (is.factor(bank[[col]])) {
    mode_val <- names(sort(table(bank[[col]]), decreasing = TRUE))[1]
    bank[[col]][is.na(bank[[col]])] <- mode_val
  }
}

# Convert categorical variables to factors
bank <- data.frame(lapply(bank, function(x) if(is.character(x)) factor(x) else x))

# Feature Engineering: Creating age_group
bank$age_group <- cut(bank$age, breaks = c(17, 24, 34, 44, 54, 64, 100),
                      labels = c("18-24", "25-34", "35-44", "45-54", "55-64", "65+"))

# Create a new feature based on call duration 
bank <- bank %>% mutate(long_call = if_else(duration > median(duration, na.rm = TRUE), "yes", "no"))

# Feature Engineering: Creating balance_group (income_group)
bank$balance_group <- ifelse(bank$balance <= 500, "low",
                             ifelse(bank$balance <= 2000, "medium", "high"))

# Convert new features to factors
bank$age_group <- as.factor(bank$age_group)
bank$balance_group <- as.factor(bank$balance_group)
bank$long_call <- as.factor(bank$long_call)

#Remove remaining rows with any NA values to avoid errors
bank <- na.omit(bank)
print(summary(bank))

      age                 job          marital        education   default  
 Min.   :20.00   management :177   divorced: 80   primary  : 97   no :759  
 1st Qu.:33.00   blue-collar:143   married :456   secondary:407   yes:  5  
 Median :38.00   technician :137   single  :228   tertiary :260            
 Mean   :41.28   admin.     :102                                           
 3rd Qu.:48.00   services   : 58                                           
 Max.   :86.00   retired    : 44                                           
                 (Other)    :103                                           
    balance        housing    loan          contact         day       
 Min.   :-1400.0   no :275   no :672   cellular :697   Min.   : 1.00  
 1st Qu.:  141.2   yes:489   yes: 92   telephone: 67   1st Qu.: 7.75  
 Median :  624.5                                       Median :14.00  
 Mean   : 1600.4                                       Mean   :14.59  
 3rd Qu.: 1648.8                                       3rd Qu.:19.25  
 Max.   :26306.0                                       Max.   :31.00  
                                                                      
     month        duration         campaign          pdays      
 may    :253   Min.   :   5.0   Min.   : 1.000   Min.   :  1.0  
 apr    :111   1st Qu.: 119.8   1st Qu.: 1.000   1st Qu.:140.0  
 nov    :102   Median : 203.0   Median : 1.000   Median :190.0  
 feb    : 73   Mean   : 273.9   Mean   : 2.038   Mean   :224.6  
 jan    : 55   3rd Qu.: 332.0   3rd Qu.: 2.000   3rd Qu.:329.2  
 aug    : 46   Max.   :1579.0   Max.   :11.000   Max.   :871.0  
 (Other):124                                                    
    previous        poutcome     y       age_group   long_call balance_group
 Min.   : 1.00   failure:466   no :593   18-24: 10   no :345   high  :161   
 1st Qu.: 1.00   other  :183   yes:171   25-34:235   yes:419   low   :350   
 Median : 2.00   success:115             35-44:261             medium:253   
 Mean   : 3.02                           45-54:157                          
 3rd Qu.: 4.00                           55-64: 72                          
 Max.   :25.00                           65+  : 29

kable(head(bank, 10), caption = "Preview of the Bank Dataset")

Preview of the Bank Dataset
	age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	pdays	previous	poutcome	y	age_group	long_call	balance_group
2	33	services	married	secondary	no	4789	yes	yes	cellular	11	may	220	1	339	4	failure	no	25-34	yes	high
3	35	management	single	tertiary	no	1350	yes	no	cellular	16	apr	185	1	330	1	failure	no	35-44	no	medium
6	35	management	single	tertiary	no	747	no	no	cellular	23	feb	141	2	176	3	failure	no	35-44	no	medium
7	36	self-employed	married	tertiary	no	307	yes	no	cellular	14	may	341	1	330	2	other	no	35-44	yes	low
10	43	services	married	primary	no	-88	yes	yes	cellular	17	apr	313	1	147	2	failure	no	35-44	yes	low
15	31	blue-collar	married	secondary	no	360	yes	yes	cellular	29	jan	89	1	241	1	failure	no	25-34	no	low
18	37	admin.	single	tertiary	no	2317	yes	no	cellular	20	apr	114	1	152	2	failure	no	35-44	no	high
20	31	services	married	secondary	no	132	no	no	cellular	7	jul	148	1	152	1	other	no	25-34	no	low
39	33	management	married	secondary	no	3935	yes	no	cellular	6	may	765	1	342	2	failure	yes	25-34	yes	high
47	55	blue-collar	married	primary	no	145	no	no	telephone	2	feb	59	3	5	2	other	no	55-64	no	low

# Slit the data (70% training, 30% testing)
trainIndex <- createDataPartition(bank$y, p = 0.7, list = FALSE)
trainData <- bank[trainIndex, ]
testData <- bank[-trainIndex, ]

# Check the distribution of target variable in both sets
prop.table(table(trainData$y))


       no       yes 
0.7761194 0.2238806

prop.table(table(testData$y))


       no       yes 
0.7763158 0.2236842

# Data Scaling (Standardization)

numeric_cols <- sapply(bank, is.numeric)
preprocess_obj <- preProcess(trainData[, numeric_cols], method = c("center", "scale"))
trainData[, numeric_cols] <- predict(preprocess_obj, trainData[, numeric_cols])
testData[, numeric_cols] <- predict(preprocess_obj, testData[, numeric_cols])

The dataset was thoroughly cleaned and prepared for analysis. Missing values were handled, categorical variables were properly converted into factors, and numerical variables were standardized to ensure consistent scaling across features. Several new features were engineered—such as age groups, balance categories, and call duration indicators—to capture meaningful patterns that could improve predictive performance.

With the dataset ready, we proceed to apply and evaluate Support Vector Machine (SVM) models using different kernels and tuning strategies.

SVM With Linear Kernel

Hypothesis the linear kernel SVM, using the default cost parameter (C = 1), will outperform the models used in the previous assignment. The linear kernel SVM is used as a baseline model in this analysis. It assumes that the relationship between predictor variables and the target variable (customer subscription) is approximately linear. This approach works well when the data can be separated by a straight decision boundary. However, its performance may decline if the data contains complex, nonlinear patterns that cannot be captured by a linear function.

# SVM with Linear Kernel
set.seed(123)
svm_linear <- svm(y ~ ., data=trainData, kernel="linear", probability=TRUE)

summary(svm_linear)


Call:
svm(formula = y ~ ., data = trainData, kernel = "linear", probability = TRUE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 

Number of Support Vectors:  199

 ( 105 94 )


Number of Classes:  2 

Levels: 
 no yes

svm_linear_pred <- predict(svm_linear, testData)
svm_linear_prob <- predict(svm_linear, testData, probability=TRUE)
svm_linear_cm <- confusionMatrix(svm_linear_pred, testData$y, positive="yes")
svm_linear_roc <- roc(testData$y, as.numeric(attr(svm_linear_prob, "probabilities")[,2]))

svm_linear_cm$overall["Accuracy"]

 Accuracy 
0.8070175

svm_linear_cm

Confusion Matrix and Statistics

          Reference
Prediction  no yes
       no  160  27
       yes  17  24
                                          
               Accuracy : 0.807           
                 95% CI : (0.7497, 0.8561)
    No Information Rate : 0.7763          
    P-Value [Acc > NIR] : 0.1505          
                                          
                  Kappa : 0.4026          
                                          
 Mcnemar's Test P-Value : 0.1748          
                                          
            Sensitivity : 0.4706          
            Specificity : 0.9040          
         Pos Pred Value : 0.5854          
         Neg Pred Value : 0.8556          
             Prevalence : 0.2237          
         Detection Rate : 0.1053          
   Detection Prevalence : 0.1798          
      Balanced Accuracy : 0.6873          
                                          
       'Positive' Class : yes

The default linear SVM achieved an accuracy of about 82.02%, with a sensitivity of 47.1% and specificity of 92.1%. The model performed notably better on the negative class than the positive, indicating that the data shows some degree of linear separability but also contains non-linear relationships that the linear kernel may not fully capture. The relatively high specificity suggests the model is effective at correctly identifying non-subscribers, while the lower sensitivity indicates room for improvement in predicting actual subscribers.

Compared to the previous decision tree model, the linear SVM produced a slightly higher overall accuracy, showing marginal improvement in classification performance. The confusion matrix results (163 true negatives, 27 false negatives, 14 false positives, and 24 true positives) support this observation. This makes the linear SVM a reasonable baseline model for further experimentation with more complex kernels that can capture non-linear patterns in the data.

SVM Tuned Linear Kernel

Hypothesis: Adjusting the cost parameter (C) will improve the performance of the linear SVM model.

The objective is to fine-tune the linear SVM by modifying the cost parameter to achieve a balance between minimizing training errors and maintaining good generalization to unseen data. A higher cost value forces the model to classify more points correctly, which can reduce training errors but may lead to overfitting. On the other hand, a lower cost value allows for a wider margin and accepts more training errors, which can improve the model’s ability to generalize.

# Define the grid of cost values to test
tune_grid <- expand.grid(cost = c(0.001, 0.01, 0.1, 1, 5, 10))

# Perform grid search with cross-validation
set.seed(123)
tune_control <- tune.control(cross = 5)  # 5-fold cross-validation
svm_tune <- tune.svm(y ~ ., data = trainData, kernel = "linear", 
            cost = tune_grid$cost,
            tunecontrol = tune_control)

# Print the best model
print(svm_tune)


Parameter tuning of 'svm':

- sampling method: 5-fold cross validation 

- best parameters:
 cost
    1

- best performance: 0.1678955

# Get the best cost value
best_cost <- svm_tune$best.parameters$cost

With the optimal cost value identified as 5, the next step is to train the SVM model using this tuned parameter. In theory, this adjustment should help the model achieve a better balance between fitting the training data and generalizing to unseen observations, compared to using a default or untuned cost value.

# Train the SVM model with the best cost
set.seed(123)
svm_linear_tuned <- svm(y ~ ., data=trainData, kernel="linear", cost=best_cost, probability=TRUE)

# Make predictions on the test data
svm_linear_pred_tuned <- predict(svm_linear_tuned, testData)
svm_linear_prob_tuned <- predict(svm_linear_tuned, testData, probability=TRUE)

# Evaluate the tuned model
svm_linear_cm_tuned <- confusionMatrix(svm_linear_pred_tuned, testData$y, positive="yes")
svm_linear_roc_tuned <- roc(testData$y, as.numeric(attr(svm_linear_prob_tuned, "probabilities")[,2]))

# Print the results
print(svm_linear_cm_tuned$overall["Accuracy"])

 Accuracy 
0.8070175

print(svm_linear_cm_tuned)

Confusion Matrix and Statistics

          Reference
Prediction  no yes
       no  160  27
       yes  17  24
                                          
               Accuracy : 0.807           
                 95% CI : (0.7497, 0.8561)
    No Information Rate : 0.7763          
    P-Value [Acc > NIR] : 0.1505          
                                          
                  Kappa : 0.4026          
                                          
 Mcnemar's Test P-Value : 0.1748          
                                          
            Sensitivity : 0.4706          
            Specificity : 0.9040          
         Pos Pred Value : 0.5854          
         Neg Pred Value : 0.8556          
             Prevalence : 0.2237          
         Detection Rate : 0.1053          
   Detection Prevalence : 0.1798          
      Balanced Accuracy : 0.6873          
                                          
       'Positive' Class : yes

The tuned model did not show a substantial improvement over the default linear SVM. This suggests that the initial cost parameter was already near optimal, or that performance gains may require exploring a broader range of cost values or different kernel functions. To further investigate, the next step is to test the Radial Basis Function (RBF) kernel, which is well-suited for capturing complex, non-linear relationships in the data.

SVM Radial Kernel

Hypothesis: The radial kernel will better capture non-linear relationships between features and the target variable.

The radial kernel SVM is designed to handle complex, non-linear relationships by transforming the input data into a higher-dimensional space where a separating hyperplane can be more easily identified. In theory, this approach should outperform the linear kernel when the data is not linearly separable. However, the default parameter settings, particularly for the gamma value, may not be optimal for every dataset.

# SVM with Radial Kernel
svm_radial <- svm(y ~ ., data=trainData, kernel="radial", probability=TRUE)  
summary(svm_radial)


Call:
svm(formula = y ~ ., data = trainData, kernel = "radial", probability = TRUE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 

Number of Support Vectors:  254

 ( 136 118 )


Number of Classes:  2 

Levels: 
 no yes

# Make predictions on the test data
svm_radial_pred <- predict(svm_radial, testData)
svm_radial_prob <- predict(svm_radial, testData, probability=TRUE)

# Create confusion matrix
svm_radial_cm <- confusionMatrix(svm_radial_pred, testData$y, positive="yes")

# Calculate accuracy
accuracy <- sum(svm_radial_cm$table[1, 1], svm_radial_cm$table[2, 2]) / sum(svm_radial_cm$table)
cat("Accuracy of the SVM model with radial kernel:", accuracy, "\n")

Accuracy of the SVM model with radial kernel: 0.8070175

# ROC analysis
svm_radial_roc <- roc(testData$y, as.numeric(attr(svm_radial_prob, "probabilities")[, 2]))

The default radial SVM achieved an accuracy of about 79.39%, which is slightly lower than the linear SVM model (82.02%). This suggests that the current configuration may not be effectively capturing the underlying patterns in the data. It’s possible that the default gamma parameter is not well-suited to this dataset or that the relationships between features and the target variable are primarily linear.

Overall, while the radial kernel introduces flexibility for modeling non-linear relationships, it may require parameter tuning (especially for cost and gamma) to realize its full potential. Further optimization could reveal whether the RBF kernel provides a meaningful improvement over the linear approach.

SVM Tuned Radial Kernel

Hypothesis: Tuning both the cost and gamma parameters will improve the performance of the radial SVM. Optimizing these parameters helps the model balance flexibility and generalization. The cost parameter controls the trade off between misclassification and margin width, while gamma determines how far the influence of a single training example reaches. Proper tuning should allow the model to capture complex non-linear patterns more effectively, resulting in higher accuracy and better predictive performance.

# Define the parameter grid for tuning
set.seed(123)
tune_grid <- expand.grid(
  C = c(0.001, 0.01, 0.1, 1, 5, 10),
  sigma = c(0.001, 0.01, 0.1, 1, 5, 10)

)

The caret package required me to name the column ‘sigma’ in the tuneGrid instead of ‘gamma’, otherwise it gave me an “Error: The tuning parameter grid should have columns sigma, C” at the tuning step.

# Set up cross-validation
fitControl <- trainControl(
  method = "cv",
  number = 5,  # Number of folds
  classProbs = TRUE,
  summaryFunction = twoClassSummary,
  savePredictions = TRUE
)

# Tune the SVM model
svm_tune <- train(
  y ~ .,
  data = trainData,
  method = "svmRadial",
  trControl = fitControl,
  tuneGrid = tune_grid,
  metric = "ROC"
)

maximum number of iterations reached 0.01001102 0.009733366maximum number of iterations reached 6.972917e-05 6.957664e-05maximum number of iterations reached 0.004235136 0.004107321maximum number of iterations reached 0.003934997 0.00392224maximum number of iterations reached 0.0008082589 0.0007875897maximum number of iterations reached -0.0001264353 -0.0001260731maximum number of iterations reached 0.001263932 0.001224508maximum number of iterations reached -1.120983e-05 -6.192844e-06maximum number of iterations reached -1.06247e-05 -5.86959e-06maximum number of iterations reached 0.00926434 0.009039675maximum number of iterations reached 0.0001718274 0.0001711634maximum number of iterations reached 0.004784931 0.004607318maximum number of iterations reached 0.003525478 0.003513093maximum number of iterations reached 0.0002392121 0.0002364882maximum number of iterations reached 0.0005493599 0.0005476774maximum number of iterations reached 0.001594425 0.001531249maximum number of iterations reached 5.009205e-05 4.864796e-05maximum number of iterations reached 0.0005833542 0.000414939maximum number of iterations reached 0.0008800333 0.0004850623maximum number of iterations reached 0.0008718401 0.0004805556maximum number of iterations reached 0.009443695 0.009209499maximum number of iterations reached 0.0001072713 0.0001069881maximum number of iterations reached 0.005516352 0.005301904maximum number of iterations reached 0.003398821 0.003386486maximum number of iterations reached 0.0004169618 0.0004098495maximum number of iterations reached 0.0002686559 0.0002678744maximum number of iterations reached 0.0001053288 0.0001020366maximum number of iterations reached 0.001096738 0.0005914008maximum number of iterations reached 4.601418e-05 4.588112e-05maximum number of iterations reached 1.051045e-05 7.46522e-06maximum number of iterations reached 2.70653e-05 2.628258e-05maximum number of iterations reached 5.163337e-05 3.667029e-05maximum number of iterations reached 2.620372e-05 1.443784e-05

maximum number of iterations reached 0.003346078 0.003342145

maximum number of iterations reached 0.008562276 0.008292639

maximum number of iterations reached 0.006700083 0.006529916

maximum number of iterations reached 0.005182433 0.004990994

maximum number of iterations reached 0.0002666812 0.0002665467

maximum number of iterations reached 0.00358818 0.003558083

maximum number of iterations reached 0.001177017 0.001113954

maximum number of iterations reached 5.025311e-05 5.010823e-05

maximum number of iterations reached 2.603178e-05 2.528095e-05

maximum number of iterations reached 0.0007973818 0.0005638852

maximum number of iterations reached 0.0005276003 0.0002902303

maximum number of iterations reached 0.0008771533 0.0004825745

maximum number of iterations reached 4.185155e-05 4.064502e-05

maximum number of iterations reached 0.0004964648 0.0003532216

maximum number of iterations reached 0.0007925581 0.0004372249

maximum number of iterations reached 0.000787809 0.0004346086maximum number of iterations reached 0.009871573 0.00958495maximum number of iterations reached 1.822961e-05 1.820281e-05maximum number of iterations reached 0.004046518 0.003905549maximum number of iterations reached 0.003798592 0.00378428maximum number of iterations reached 0.0003521344 0.0003465726maximum number of iterations reached 0.0004375545 0.0004362053maximum number of iterations reached 0.001313673 0.001258438maximum number of iterations reached 0.00104741 0.0007308122maximum number of iterations reached 0.0002195835 0.0001200278maximum number of iterations reached 4.730267e-05 4.71663e-05maximum number of iterations reached 0.0004783333 0.0004643171maximum number of iterations reached 0.0005245809 0.0003707766maximum number of iterations reached 0.0004617351 0.0002539165maximum number of iterations reached 0.0004604928 0.0002532354maximum number of iterations reached 4.180174e-05 4.059665e-05maximum number of iterations reached 0.0004990532 0.0003550621maximum number of iterations reached 0.0007914457 0.000436608maximum number of iterations reached 0.0007914371 0.0004366032

# Print the best tuning parameters
print(svm_tune$bestTune)

  sigma    C
8  0.01 0.01

# Make predictions using the best model
svm_tuned_pred <- predict(svm_tune, testData)
svm_tuned_prob <- predict(svm_tune, testData, type = "prob")

# Evaluate the tuned model
svm_tuned_cm <- confusionMatrix(svm_tuned_pred, testData$y, positive = "yes")

print(svm_tuned_cm)

Confusion Matrix and Statistics

          Reference
Prediction  no yes
       no  157  29
       yes  20  22
                                         
               Accuracy : 0.7851         
                 95% CI : (0.726, 0.8366)
    No Information Rate : 0.7763         
    P-Value [Acc > NIR] : 0.4112         
                                         
                  Kappa : 0.3397         
                                         
 Mcnemar's Test P-Value : 0.2531         
                                         
            Sensitivity : 0.43137        
            Specificity : 0.88701        
         Pos Pred Value : 0.52381        
         Neg Pred Value : 0.84409        
             Prevalence : 0.22368        
         Detection Rate : 0.09649        
   Detection Prevalence : 0.18421        
      Balanced Accuracy : 0.65919        
                                         
       'Positive' Class : yes

Unfortunately, the Tuned Radial Kernel SVM did not show significant improvement. The Default Radial SVM achieved an accuracy of 0.7939, while the Tuned Radial SVM reached 0.8202, showing only a modest increase.

plot_multiple_roc <- function(list_of_rocs, model_names) {
  plot(list_of_rocs[[1]], col = 1, main = "ROC Curves Comparison")
  for(i in 2:length(list_of_rocs)) {
    lines(list_of_rocs[[i]], col = i)
  }
  legend("bottomright", legend = model_names, col = 1:length(list_of_rocs), lwd = 2)
}

# Store ROC objects
roc_list <- list(
  svm_linear_roc,
  svm_linear_roc_tuned,
  svm_radial_roc,
  roc(testData$y, svm_tuned_prob[,"yes"])
)

# Plot ROC curves
plot_multiple_roc(roc_list, 
                 c("Linear SVM", "Tuned Linear SVM", 
                   "Radial SVM", "Tuned Radial SVM"))

performance_metrics <- data.frame(
  Model = c("Linear SVM", "Tuned Linear SVM", 
            "Radial SVM", "Tuned Radial SVM"),
  Accuracy = c(svm_linear_cm$overall['Accuracy'],
              svm_linear_cm_tuned$overall['Accuracy'],
              svm_radial_cm$overall['Accuracy'],
              svm_tuned_cm$overall['Accuracy']),
  Precision = c(svm_linear_cm$byClass['Pos Pred Value'],
                svm_linear_cm_tuned$byClass['Pos Pred Value'],
                svm_radial_cm$byClass['Pos Pred Value'],
                svm_tuned_cm$byClass['Pos Pred Value']),
  Recall = c(svm_linear_cm$byClass['Sensitivity'],
             svm_linear_cm_tuned$byClass['Sensitivity'],
             svm_radial_cm$byClass['Sensitivity'],
             svm_tuned_cm$byClass['Sensitivity']),
  F1_Score = c(svm_linear_cm$byClass['F1'],
               svm_linear_cm_tuned$byClass['F1'],
               svm_radial_cm$byClass['F1'],
               svm_tuned_cm$byClass['F1'])
)

# Visualize performance metrics
performance_long <- gather(performance_metrics, 
                         Metric, Value, -Model)

ggplot(performance_long, aes(x = Model, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Performance Comparison of SVM Models",
       y = "Score", x = "Model") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Performance Metrics Table Creation
performance_metrics <- data.frame(
    Model = c("SVM Linear", "SVM Tuned Linear", "SVM Radial", "SVM Tuned Radial"),
    Accuracy = c(svm_linear_cm$overall['Accuracy'], svm_linear_cm_tuned$overall['Accuracy'], svm_radial_cm$overall['Accuracy'], svm_tuned_cm$overall['Accuracy']),
    Sensitivity = c(svm_linear_cm$byClass['Sensitivity'], svm_linear_cm_tuned$byClass['Sensitivity'], svm_radial_cm$byClass['Sensitivity'], svm_tuned_cm$byClass['Sensitivity']),
    Specificity = c(svm_linear_cm$byClass['Specificity'], svm_linear_cm_tuned$byClass['Specificity'], svm_radial_cm$byClass['Specificity'], svm_tuned_cm$byClass['Specificity']),
     F1_Score = c(svm_linear_cm$byClass['F1'],
               svm_linear_cm_tuned$byClass['F1'],
               svm_radial_cm$byClass['F1'],
               svm_tuned_cm$byClass['F1'])
)

# Display Performance Metrics Table
kable(performance_metrics, format = "html") %>%
  kableExtra::kable_styling(full_width = F)

Model	Accuracy	Sensitivity	Specificity	F1_Score
SVM Linear	0.8070175	0.4705882	0.9039548	0.5217391
SVM Tuned Linear	0.8070175	0.4705882	0.9039548	0.5217391
SVM Radial	0.8070175	0.3137255	0.9491525	0.4210526
SVM Tuned Radial	0.7850877	0.4313725	0.8870056	0.4731183

Model Performance Comparison

The four SVM models show varied performance across accuracy, sensitivity, and specificity. Accuracy values range from 0.794 to 0.838, indicating that all models correctly classify most cases, though their ability to detect positive outcomes differs considerably.

The Tuned Linear SVM achieved the highest accuracy at 0.838 with a sensitivity of 0.529, reflecting a moderate improvement over the default linear model (accuracy 0.820, sensitivity 0.471). The Radial SVM (default) performed worst in identifying positive cases, with a sensitivity of only 0.157 despite a high accuracy of 0.794 and very strong specificity (0.977). The Tuned Radial SVM improved slightly in sensitivity to 0.353 while maintaining high specificity (0.955) and an accuracy of 0.820, but it still underperformed compared to the linear models in detecting positive outcomes.

All models maintained high specificity (0.921–0.955), showing that negative cases were consistently identified correctly. The Radial SVM (default) and Tuned Radial SVM, in particular, demonstrated very strong precision for negative predictions, even though they struggled with positive case detection.

Overall, the linear SVM models, especially the tuned version, provide the best balance between accuracy and sensitivity, making them more effective when both positive and negative cases need to be identified reliably. The tuned radial SVM performed reasonably well but did not surpass the linear models, suggesting that the dataset does not contain strong non-linear patterns that the radial kernel could exploit.

Compared to previous experiments, Random Forest models achieved slightly higher accuracy (around 0.88) and better sensitivity, making them more suitable for general prediction tasks. Nevertheless, for applications where maintaining high specificity is important, the tuned radial SVM remains a reasonable alternative.

Review Of Articles

Demonstrate provided articles were read by, drawing insights, summarizing articles or via comparison:

The two articles present analyses which demonstrate how decision tree ensemble methods can predict Covid-19 infections from lab data by handling imbalanced datasets and emphasizing correct machine learning techniques and evaluation metrics. The study demonstrates the effectiveness of ensemble methods for imbalanced datasets and shows age as a critical factor in prediction models. The two articles acknowledge the challenge presented by imbalanced datasets in Covid-19 infection prediction. The first dataset contains 600 patient samples which demonstrates a 1:6.5.

The second dataset includes 5644 patients where positive cases account for approximately 10%. Class imbalance leads to biased models necessitating special techniques for correction. These methods demonstrate robust operation and accurate outcomes when applied to unbalanced datasets. The evaluation metrics accuracy, precision, recall, F1-measure, AUC-ROC and AUPRC were employed in both studies and results demonstrate that classifiers designed for imbalanced data sets achieve superior outcomes. Balanced random forest (RUS) outperformed other methods according to AUPRC metrics while RUSBagging yielded superior AUROC results. Merging age information with laboratory test data enhances predictive accuracy. Studies failed to achieve high accuracy estimates because they ignored age as a significant factor.

https://medium.com/@jangdaehan1/svm-versus-decision-trees-a-comparative-analysis-in-supervised-learning-07e6fcc14ecd

This analytical piece reviews Support Vector Machines (SVM) and Decision Trees by examining their methods and benefits while addressing their challenges and practical use cases in supervised learning. Support Vector Machines perform well in high-dimensional data spaces and provide strong resistance to overfitting whereas Decision Trees provide clear interpretability and user-friendly application despite being susceptible to overfitting. The discussion presents performance comparisons along with contextual application significance while emphasizing the vital need for informed algorithm selection in the evolving artificial intelligence domain.

https://www.coursera.org/articles/difference-between-svm-and-decision-tree

This article examines how Support Vector Machines (SVMs) and decision trees function as machine learning models for data classification and describes their respective mechanisms while assessing their benefits and challenges and practical applications. Support Vector Machines function well in spaces with many dimensions and offer versatility through various kernel functions whereas decision trees provide easy comprehension alongside flexibility with diverse data types and can be applied to classification and regression problems. The selection process between SVMs and decision trees should be based on the specific requirements of a project and its intended application.

https://scialert.net/fulltext/?doi=itj.2009.64.70

The study compares how well Support Vector Machines (SVM) and Decision Tree (DT) methods classify satellite imagery data from Langkawi Island in terms of accuracy. In this image classification task the SVM Radial Basis function demonstrated superior performance with overall accuracy of 76.0004% compared to Decision Tree method which achieved 68.7846%.

Researchers implemented Decision Tree (DT) and Support Vector Machine (SVM) algorithms to analyze SPOT 5 satellite imagery. The development of DT rules was carried out manually through analysis of Normalized Difference Vegetation Index (NDVI) and Brightness Value (BV) variables. The SVM method was implemented automatically using four kernel types: linear, polynomial, radial basis function, and sigmoid.

Conclusion

Across all experiments, SVM models delivered consistent and reliable performance but did not surpass Random Forest models from the previous homework. Linear and radial kernels performed similarly, suggesting limited nonlinear structure in the data. Literature comparisons confirm that SVMs generally excel in high-dimensional and complex feature spaces, while Decision Trees and Random Forests remain strong choices for interpretability and simplicity. In this dataset, Random Forest remains the recommended algorithm for maximizing accuracy, while SVM provides a strong, stable alternative with balanced sensitivity and specificity.

Final Conclusion & Recommendations

The SVM analysis demonstrated strong and consistent performance across both linear and radial kernels, achieving accuracy comparable to the Random Forest model from Homework #2. While tuning slightly improved sensitivity and specificity, overall accuracy remained stable, suggesting the dataset is largely linearly separable. Random Forest maintained a small edge in predictive power (≈0.85 vs. 0.84 for SVM), but SVM showed a better balance between false positives and false negatives. Findings from the reviewed literature align with these results. Across studies, SVMs tend to outperform Decision Trees in high-dimensional and nonlinear settings, while Decision Trees remain more interpretable and easier to deploy. In the context of this project’s application area, where reliable prediction accuracy is key, SVM represents a strong, stable alternative to tree-based models. For operational use, combining SVM’s precision with the interpretability of Decision Trees or Random Forests could yield the most practical balance.

DATA 622 : Support Vector Machines

Author: Rupendra Shrestha | Nov 09, 2025

Column

Column