Instructions
Perform an analysis of the dataset(s) used in Homework #2 using the SVM algorithm. Compare the results with the results from previous homework.
Homework #3
Read the following articles:
Search for academic content (at least 3 articles) that compare the use of decision trees vs SVMs in your current area of expertise.
Perform an analysis of the dataset used in Homework #2 using the SVM algorithm.
Compare the results with the results from previous homework.
Answer questions, such as:
Introduction
In this assignment, we apply the Support Vector Machine (SVM) algorithm to the dataset used in our previous homework (Homework #2). The goal is two-fold: first, to explore the performance of SVM under different kernels and tuning parameters; and second, to compare those results with the findings from the earlier work, thereby determining whether SVM offers measurable improvements in our context.
Support Vector Machines represent a powerful class of supervised learning methods that are widely used for both classification and regression tasks. By maximizing the margin between classes (or performing regression via the so-called epsilon‐insensitive loss), SVM aim for strong generalization and robustness. Moreover, through the use of kernel functions they can flexibly handle non-linear relationships in the data.
Abstract
This assignment investigates the application of Support Vector Machines (SVM) to the dataset from Homework #2. The study involves training SVM models using multiple kernels and hyper‑parameter tuning, validating their performance, and comparing results with the algorithms used in the previous homework. Findings are analyzed in the context of existing literature comparing decision trees and SVM, with a focus on accuracy, robustness, and suitability for the domain. The analysis aims to determine whether SVM provides improved predictive performance and practical advantages over earlier methods.
Data Set
A Portuguese bank conducted a marketing campaign (phone calls) to predict if a client will subscribe to a term deposit The records of their efforts are available in the form of a dataset. The objective here is to apply machine learning techniques to analyze the dataset and figure out most effective tactics that will help the bank in next campaign to persuade more customers to subscribe to the bank’s term deposit. Download the Bank Marketing Dataset from: https://archive.ics.uci.edu/dataset/222/bank+marketing
Load libraries
I have done the same steps which I did in the previous homework to prepare the data for the modeling. In addition to the previous pre-processing steps, data scaling was added.#A Portuguese bank conducted a marketing
# Read a CSV file
bank <- read.csv("bank.csv", sep = ";")
# Preview the first few rows of the dataset
#kable(head(bank, 10), caption = "Preview of the Bank Dataset")# Replace "unknown" with NA
bank <- bank %>% mutate_all(~ifelse(. == "unknown", NA, .))
# Handle missing values
for (col in names(bank)) {
if (is.factor(bank[[col]])) {
mode_val <- names(sort(table(bank[[col]]), decreasing = TRUE))[1]
bank[[col]][is.na(bank[[col]])] <- mode_val
}
}
# Convert categorical variables to factors
bank <- data.frame(lapply(bank, function(x) if(is.character(x)) factor(x) else x))
# Feature Engineering: Creating age_group
bank$age_group <- cut(bank$age, breaks = c(17, 24, 34, 44, 54, 64, 100),
labels = c("18-24", "25-34", "35-44", "45-54", "55-64", "65+"))
# Create a new feature based on call duration
bank <- bank %>% mutate(long_call = if_else(duration > median(duration, na.rm = TRUE), "yes", "no"))
# Feature Engineering: Creating balance_group (income_group)
bank$balance_group <- ifelse(bank$balance <= 500, "low",
ifelse(bank$balance <= 2000, "medium", "high"))
# Convert new features to factors
bank$age_group <- as.factor(bank$age_group)
bank$balance_group <- as.factor(bank$balance_group)
bank$long_call <- as.factor(bank$long_call)
#Remove remaining rows with any NA values to avoid errors
bank <- na.omit(bank)
print(summary(bank)) age job marital education default
Min. :20.00 management :177 divorced: 80 primary : 97 no :759
1st Qu.:33.00 blue-collar:143 married :456 secondary:407 yes: 5
Median :38.00 technician :137 single :228 tertiary :260
Mean :41.28 admin. :102
3rd Qu.:48.00 services : 58
Max. :86.00 retired : 44
(Other) :103
balance housing loan contact day
Min. :-1400.0 no :275 no :672 cellular :697 Min. : 1.00
1st Qu.: 141.2 yes:489 yes: 92 telephone: 67 1st Qu.: 7.75
Median : 624.5 Median :14.00
Mean : 1600.4 Mean :14.59
3rd Qu.: 1648.8 3rd Qu.:19.25
Max. :26306.0 Max. :31.00
month duration campaign pdays
may :253 Min. : 5.0 Min. : 1.000 Min. : 1.0
apr :111 1st Qu.: 119.8 1st Qu.: 1.000 1st Qu.:140.0
nov :102 Median : 203.0 Median : 1.000 Median :190.0
feb : 73 Mean : 273.9 Mean : 2.038 Mean :224.6
jan : 55 3rd Qu.: 332.0 3rd Qu.: 2.000 3rd Qu.:329.2
aug : 46 Max. :1579.0 Max. :11.000 Max. :871.0
(Other):124
previous poutcome y age_group long_call balance_group
Min. : 1.00 failure:466 no :593 18-24: 10 no :345 high :161
1st Qu.: 1.00 other :183 yes:171 25-34:235 yes:419 low :350
Median : 2.00 success:115 35-44:261 medium:253
Mean : 3.02 45-54:157
3rd Qu.: 4.00 55-64: 72
Max. :25.00 65+ : 29
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y | age_group | long_call | balance_group | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 33 | services | married | secondary | no | 4789 | yes | yes | cellular | 11 | may | 220 | 1 | 339 | 4 | failure | no | 25-34 | yes | high |
| 3 | 35 | management | single | tertiary | no | 1350 | yes | no | cellular | 16 | apr | 185 | 1 | 330 | 1 | failure | no | 35-44 | no | medium |
| 6 | 35 | management | single | tertiary | no | 747 | no | no | cellular | 23 | feb | 141 | 2 | 176 | 3 | failure | no | 35-44 | no | medium |
| 7 | 36 | self-employed | married | tertiary | no | 307 | yes | no | cellular | 14 | may | 341 | 1 | 330 | 2 | other | no | 35-44 | yes | low |
| 10 | 43 | services | married | primary | no | -88 | yes | yes | cellular | 17 | apr | 313 | 1 | 147 | 2 | failure | no | 35-44 | yes | low |
| 15 | 31 | blue-collar | married | secondary | no | 360 | yes | yes | cellular | 29 | jan | 89 | 1 | 241 | 1 | failure | no | 25-34 | no | low |
| 18 | 37 | admin. | single | tertiary | no | 2317 | yes | no | cellular | 20 | apr | 114 | 1 | 152 | 2 | failure | no | 35-44 | no | high |
| 20 | 31 | services | married | secondary | no | 132 | no | no | cellular | 7 | jul | 148 | 1 | 152 | 1 | other | no | 25-34 | no | low |
| 39 | 33 | management | married | secondary | no | 3935 | yes | no | cellular | 6 | may | 765 | 1 | 342 | 2 | failure | yes | 25-34 | yes | high |
| 47 | 55 | blue-collar | married | primary | no | 145 | no | no | telephone | 2 | feb | 59 | 3 | 5 | 2 | other | no | 55-64 | no | low |
# Slit the data (70% training, 30% testing)
trainIndex <- createDataPartition(bank$y, p = 0.7, list = FALSE)
trainData <- bank[trainIndex, ]
testData <- bank[-trainIndex, ]
# Check the distribution of target variable in both sets
prop.table(table(trainData$y))
no yes
0.7761194 0.2238806
no yes
0.7763158 0.2236842
# Data Scaling (Standardization)
numeric_cols <- sapply(bank, is.numeric)
preprocess_obj <- preProcess(trainData[, numeric_cols], method = c("center", "scale"))
trainData[, numeric_cols] <- predict(preprocess_obj, trainData[, numeric_cols])
testData[, numeric_cols] <- predict(preprocess_obj, testData[, numeric_cols])The dataset was thoroughly cleaned and prepared for analysis. Missing values were handled, categorical variables were properly converted into factors, and numerical variables were standardized to ensure consistent scaling across features. Several new features were engineered—such as age groups, balance categories, and call duration indicators—to capture meaningful patterns that could improve predictive performance.
With the dataset ready, we proceed to apply and evaluate Support Vector Machine (SVM) models using different kernels and tuning strategies.
SVM With Linear Kernel
Hypothesis the linear kernel SVM, using the default cost parameter (C = 1), will outperform the models used in the previous assignment. The linear kernel SVM is used as a baseline model in this analysis. It assumes that the relationship between predictor variables and the target variable (customer subscription) is approximately linear. This approach works well when the data can be separated by a straight decision boundary. However, its performance may decline if the data contains complex, nonlinear patterns that cannot be captured by a linear function.
# SVM with Linear Kernel
set.seed(123)
svm_linear <- svm(y ~ ., data=trainData, kernel="linear", probability=TRUE)
summary(svm_linear)
Call:
svm(formula = y ~ ., data = trainData, kernel = "linear", probability = TRUE)
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 1
Number of Support Vectors: 199
( 105 94 )
Number of Classes: 2
Levels:
no yes
svm_linear_pred <- predict(svm_linear, testData)
svm_linear_prob <- predict(svm_linear, testData, probability=TRUE)
svm_linear_cm <- confusionMatrix(svm_linear_pred, testData$y, positive="yes")
svm_linear_roc <- roc(testData$y, as.numeric(attr(svm_linear_prob, "probabilities")[,2]))
svm_linear_cm$overall["Accuracy"] Accuracy
0.8070175
Confusion Matrix and Statistics
Reference
Prediction no yes
no 160 27
yes 17 24
Accuracy : 0.807
95% CI : (0.7497, 0.8561)
No Information Rate : 0.7763
P-Value [Acc > NIR] : 0.1505
Kappa : 0.4026
Mcnemar's Test P-Value : 0.1748
Sensitivity : 0.4706
Specificity : 0.9040
Pos Pred Value : 0.5854
Neg Pred Value : 0.8556
Prevalence : 0.2237
Detection Rate : 0.1053
Detection Prevalence : 0.1798
Balanced Accuracy : 0.6873
'Positive' Class : yes
The default linear SVM achieved an accuracy of about 82.02%, with a sensitivity of 47.1% and specificity of 92.1%. The model performed notably better on the negative class than the positive, indicating that the data shows some degree of linear separability but also contains non-linear relationships that the linear kernel may not fully capture. The relatively high specificity suggests the model is effective at correctly identifying non-subscribers, while the lower sensitivity indicates room for improvement in predicting actual subscribers.
Compared to the previous decision tree model, the linear SVM produced a slightly higher overall accuracy, showing marginal improvement in classification performance. The confusion matrix results (163 true negatives, 27 false negatives, 14 false positives, and 24 true positives) support this observation. This makes the linear SVM a reasonable baseline model for further experimentation with more complex kernels that can capture non-linear patterns in the data.
SVM Tuned Linear Kernel
Hypothesis: Adjusting the cost parameter (C) will improve the performance of the linear SVM model.
The objective is to fine-tune the linear SVM by modifying the cost parameter to achieve a balance between minimizing training errors and maintaining good generalization to unseen data. A higher cost value forces the model to classify more points correctly, which can reduce training errors but may lead to overfitting. On the other hand, a lower cost value allows for a wider margin and accepts more training errors, which can improve the model’s ability to generalize.
# Define the grid of cost values to test
tune_grid <- expand.grid(cost = c(0.001, 0.01, 0.1, 1, 5, 10))
# Perform grid search with cross-validation
set.seed(123)
tune_control <- tune.control(cross = 5) # 5-fold cross-validation
svm_tune <- tune.svm(y ~ ., data = trainData, kernel = "linear",
cost = tune_grid$cost,
tunecontrol = tune_control)
# Print the best model
print(svm_tune)
Parameter tuning of 'svm':
- sampling method: 5-fold cross validation
- best parameters:
cost
1
- best performance: 0.1678955
With the optimal cost value identified as 5, the next step is to train the SVM model using this tuned parameter. In theory, this adjustment should help the model achieve a better balance between fitting the training data and generalizing to unseen observations, compared to using a default or untuned cost value.
# Train the SVM model with the best cost
set.seed(123)
svm_linear_tuned <- svm(y ~ ., data=trainData, kernel="linear", cost=best_cost, probability=TRUE)
# Make predictions on the test data
svm_linear_pred_tuned <- predict(svm_linear_tuned, testData)
svm_linear_prob_tuned <- predict(svm_linear_tuned, testData, probability=TRUE)
# Evaluate the tuned model
svm_linear_cm_tuned <- confusionMatrix(svm_linear_pred_tuned, testData$y, positive="yes")
svm_linear_roc_tuned <- roc(testData$y, as.numeric(attr(svm_linear_prob_tuned, "probabilities")[,2])) Accuracy
0.8070175
Confusion Matrix and Statistics
Reference
Prediction no yes
no 160 27
yes 17 24
Accuracy : 0.807
95% CI : (0.7497, 0.8561)
No Information Rate : 0.7763
P-Value [Acc > NIR] : 0.1505
Kappa : 0.4026
Mcnemar's Test P-Value : 0.1748
Sensitivity : 0.4706
Specificity : 0.9040
Pos Pred Value : 0.5854
Neg Pred Value : 0.8556
Prevalence : 0.2237
Detection Rate : 0.1053
Detection Prevalence : 0.1798
Balanced Accuracy : 0.6873
'Positive' Class : yes
The tuned model did not show a substantial improvement over the default linear SVM. This suggests that the initial cost parameter was already near optimal, or that performance gains may require exploring a broader range of cost values or different kernel functions. To further investigate, the next step is to test the Radial Basis Function (RBF) kernel, which is well-suited for capturing complex, non-linear relationships in the data.
SVM Radial Kernel
Hypothesis: The radial kernel will better capture non-linear relationships between features and the target variable.
The radial kernel SVM is designed to handle complex, non-linear relationships by transforming the input data into a higher-dimensional space where a separating hyperplane can be more easily identified. In theory, this approach should outperform the linear kernel when the data is not linearly separable. However, the default parameter settings, particularly for the gamma value, may not be optimal for every dataset.
# SVM with Radial Kernel
svm_radial <- svm(y ~ ., data=trainData, kernel="radial", probability=TRUE)
summary(svm_radial)
Call:
svm(formula = y ~ ., data = trainData, kernel = "radial", probability = TRUE)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
Number of Support Vectors: 254
( 136 118 )
Number of Classes: 2
Levels:
no yes
# Make predictions on the test data
svm_radial_pred <- predict(svm_radial, testData)
svm_radial_prob <- predict(svm_radial, testData, probability=TRUE)
# Create confusion matrix
svm_radial_cm <- confusionMatrix(svm_radial_pred, testData$y, positive="yes")
# Calculate accuracy
accuracy <- sum(svm_radial_cm$table[1, 1], svm_radial_cm$table[2, 2]) / sum(svm_radial_cm$table)
cat("Accuracy of the SVM model with radial kernel:", accuracy, "\n")Accuracy of the SVM model with radial kernel: 0.8070175
# ROC analysis
svm_radial_roc <- roc(testData$y, as.numeric(attr(svm_radial_prob, "probabilities")[, 2]))The default radial SVM achieved an accuracy of about 79.39%, which is slightly lower than the linear SVM model (82.02%). This suggests that the current configuration may not be effectively capturing the underlying patterns in the data. It’s possible that the default gamma parameter is not well-suited to this dataset or that the relationships between features and the target variable are primarily linear.
Overall, while the radial kernel introduces flexibility for modeling non-linear relationships, it may require parameter tuning (especially for cost and gamma) to realize its full potential. Further optimization could reveal whether the RBF kernel provides a meaningful improvement over the linear approach.
SVM Tuned Radial Kernel
Hypothesis: Tuning both the cost and gamma parameters will improve the performance of the radial SVM. Optimizing these parameters helps the model balance flexibility and generalization. The cost parameter controls the trade off between misclassification and margin width, while gamma determines how far the influence of a single training example reaches. Proper tuning should allow the model to capture complex non-linear patterns more effectively, resulting in higher accuracy and better predictive performance.
# Define the parameter grid for tuning
set.seed(123)
tune_grid <- expand.grid(
C = c(0.001, 0.01, 0.1, 1, 5, 10),
sigma = c(0.001, 0.01, 0.1, 1, 5, 10)
)The caret package required me to name the column ‘sigma’ in the tuneGrid instead of ‘gamma’, otherwise it gave me an “Error: The tuning parameter grid should have columns sigma, C” at the tuning step.
# Set up cross-validation
fitControl <- trainControl(
method = "cv",
number = 5, # Number of folds
classProbs = TRUE,
summaryFunction = twoClassSummary,
savePredictions = TRUE
)# Tune the SVM model
svm_tune <- train(
y ~ .,
data = trainData,
method = "svmRadial",
trControl = fitControl,
tuneGrid = tune_grid,
metric = "ROC"
)maximum number of iterations reached 0.01001102 0.009733366maximum number of iterations reached 6.972917e-05 6.957664e-05maximum number of iterations reached 0.004235136 0.004107321maximum number of iterations reached 0.003934997 0.00392224maximum number of iterations reached 0.0008082589 0.0007875897maximum number of iterations reached -0.0001264353 -0.0001260731maximum number of iterations reached 0.001263932 0.001224508maximum number of iterations reached -1.120983e-05 -6.192844e-06maximum number of iterations reached -1.06247e-05 -5.86959e-06maximum number of iterations reached 0.00926434 0.009039675maximum number of iterations reached 0.0001718274 0.0001711634maximum number of iterations reached 0.004784931 0.004607318maximum number of iterations reached 0.003525478 0.003513093maximum number of iterations reached 0.0002392121 0.0002364882maximum number of iterations reached 0.0005493599 0.0005476774maximum number of iterations reached 0.001594425 0.001531249maximum number of iterations reached 5.009205e-05 4.864796e-05maximum number of iterations reached 0.0005833542 0.000414939maximum number of iterations reached 0.0008800333 0.0004850623maximum number of iterations reached 0.0008718401 0.0004805556maximum number of iterations reached 0.009443695 0.009209499maximum number of iterations reached 0.0001072713 0.0001069881maximum number of iterations reached 0.005516352 0.005301904maximum number of iterations reached 0.003398821 0.003386486maximum number of iterations reached 0.0004169618 0.0004098495maximum number of iterations reached 0.0002686559 0.0002678744maximum number of iterations reached 0.0001053288 0.0001020366maximum number of iterations reached 0.001096738 0.0005914008maximum number of iterations reached 4.601418e-05 4.588112e-05maximum number of iterations reached 1.051045e-05 7.46522e-06maximum number of iterations reached 2.70653e-05 2.628258e-05maximum number of iterations reached 5.163337e-05 3.667029e-05maximum number of iterations reached 2.620372e-05 1.443784e-05
maximum number of iterations reached 0.003346078 0.003342145
maximum number of iterations reached 0.008562276 0.008292639
maximum number of iterations reached 0.006700083 0.006529916
maximum number of iterations reached 0.005182433 0.004990994
maximum number of iterations reached 0.0002666812 0.0002665467
maximum number of iterations reached 0.00358818 0.003558083
maximum number of iterations reached 0.001177017 0.001113954
maximum number of iterations reached 5.025311e-05 5.010823e-05
maximum number of iterations reached 2.603178e-05 2.528095e-05
maximum number of iterations reached 0.0007973818 0.0005638852
maximum number of iterations reached 0.0005276003 0.0002902303
maximum number of iterations reached 0.0008771533 0.0004825745
maximum number of iterations reached 4.185155e-05 4.064502e-05
maximum number of iterations reached 0.0004964648 0.0003532216
maximum number of iterations reached 0.0007925581 0.0004372249
maximum number of iterations reached 0.000787809 0.0004346086maximum number of iterations reached 0.009871573 0.00958495maximum number of iterations reached 1.822961e-05 1.820281e-05maximum number of iterations reached 0.004046518 0.003905549maximum number of iterations reached 0.003798592 0.00378428maximum number of iterations reached 0.0003521344 0.0003465726maximum number of iterations reached 0.0004375545 0.0004362053maximum number of iterations reached 0.001313673 0.001258438maximum number of iterations reached 0.00104741 0.0007308122maximum number of iterations reached 0.0002195835 0.0001200278maximum number of iterations reached 4.730267e-05 4.71663e-05maximum number of iterations reached 0.0004783333 0.0004643171maximum number of iterations reached 0.0005245809 0.0003707766maximum number of iterations reached 0.0004617351 0.0002539165maximum number of iterations reached 0.0004604928 0.0002532354maximum number of iterations reached 4.180174e-05 4.059665e-05maximum number of iterations reached 0.0004990532 0.0003550621maximum number of iterations reached 0.0007914457 0.000436608maximum number of iterations reached 0.0007914371 0.0004366032
sigma C
8 0.01 0.01
# Make predictions using the best model
svm_tuned_pred <- predict(svm_tune, testData)
svm_tuned_prob <- predict(svm_tune, testData, type = "prob")
# Evaluate the tuned model
svm_tuned_cm <- confusionMatrix(svm_tuned_pred, testData$y, positive = "yes")
print(svm_tuned_cm)Confusion Matrix and Statistics
Reference
Prediction no yes
no 157 29
yes 20 22
Accuracy : 0.7851
95% CI : (0.726, 0.8366)
No Information Rate : 0.7763
P-Value [Acc > NIR] : 0.4112
Kappa : 0.3397
Mcnemar's Test P-Value : 0.2531
Sensitivity : 0.43137
Specificity : 0.88701
Pos Pred Value : 0.52381
Neg Pred Value : 0.84409
Prevalence : 0.22368
Detection Rate : 0.09649
Detection Prevalence : 0.18421
Balanced Accuracy : 0.65919
'Positive' Class : yes
Unfortunately, the Tuned Radial Kernel SVM did not show significant improvement. The Default Radial SVM achieved an accuracy of 0.7939, while the Tuned Radial SVM reached 0.8202, showing only a modest increase.
plot_multiple_roc <- function(list_of_rocs, model_names) {
plot(list_of_rocs[[1]], col = 1, main = "ROC Curves Comparison")
for(i in 2:length(list_of_rocs)) {
lines(list_of_rocs[[i]], col = i)
}
legend("bottomright", legend = model_names, col = 1:length(list_of_rocs), lwd = 2)
}
# Store ROC objects
roc_list <- list(
svm_linear_roc,
svm_linear_roc_tuned,
svm_radial_roc,
roc(testData$y, svm_tuned_prob[,"yes"])
)# Plot ROC curves
plot_multiple_roc(roc_list,
c("Linear SVM", "Tuned Linear SVM",
"Radial SVM", "Tuned Radial SVM"))performance_metrics <- data.frame(
Model = c("Linear SVM", "Tuned Linear SVM",
"Radial SVM", "Tuned Radial SVM"),
Accuracy = c(svm_linear_cm$overall['Accuracy'],
svm_linear_cm_tuned$overall['Accuracy'],
svm_radial_cm$overall['Accuracy'],
svm_tuned_cm$overall['Accuracy']),
Precision = c(svm_linear_cm$byClass['Pos Pred Value'],
svm_linear_cm_tuned$byClass['Pos Pred Value'],
svm_radial_cm$byClass['Pos Pred Value'],
svm_tuned_cm$byClass['Pos Pred Value']),
Recall = c(svm_linear_cm$byClass['Sensitivity'],
svm_linear_cm_tuned$byClass['Sensitivity'],
svm_radial_cm$byClass['Sensitivity'],
svm_tuned_cm$byClass['Sensitivity']),
F1_Score = c(svm_linear_cm$byClass['F1'],
svm_linear_cm_tuned$byClass['F1'],
svm_radial_cm$byClass['F1'],
svm_tuned_cm$byClass['F1'])
)
# Visualize performance metrics
performance_long <- gather(performance_metrics,
Metric, Value, -Model)
ggplot(performance_long, aes(x = Model, y = Value, fill = Metric)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
labs(title = "Performance Comparison of SVM Models",
y = "Score", x = "Model") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))# Performance Metrics Table Creation
performance_metrics <- data.frame(
Model = c("SVM Linear", "SVM Tuned Linear", "SVM Radial", "SVM Tuned Radial"),
Accuracy = c(svm_linear_cm$overall['Accuracy'], svm_linear_cm_tuned$overall['Accuracy'], svm_radial_cm$overall['Accuracy'], svm_tuned_cm$overall['Accuracy']),
Sensitivity = c(svm_linear_cm$byClass['Sensitivity'], svm_linear_cm_tuned$byClass['Sensitivity'], svm_radial_cm$byClass['Sensitivity'], svm_tuned_cm$byClass['Sensitivity']),
Specificity = c(svm_linear_cm$byClass['Specificity'], svm_linear_cm_tuned$byClass['Specificity'], svm_radial_cm$byClass['Specificity'], svm_tuned_cm$byClass['Specificity']),
F1_Score = c(svm_linear_cm$byClass['F1'],
svm_linear_cm_tuned$byClass['F1'],
svm_radial_cm$byClass['F1'],
svm_tuned_cm$byClass['F1'])
)
# Display Performance Metrics Table
kable(performance_metrics, format = "html") %>%
kableExtra::kable_styling(full_width = F)| Model | Accuracy | Sensitivity | Specificity | F1_Score |
|---|---|---|---|---|
| SVM Linear | 0.8070175 | 0.4705882 | 0.9039548 | 0.5217391 |
| SVM Tuned Linear | 0.8070175 | 0.4705882 | 0.9039548 | 0.5217391 |
| SVM Radial | 0.8070175 | 0.3137255 | 0.9491525 | 0.4210526 |
| SVM Tuned Radial | 0.7850877 | 0.4313725 | 0.8870056 | 0.4731183 |
Model Performance Comparison
The four SVM models show varied performance across accuracy, sensitivity, and specificity. Accuracy values range from 0.794 to 0.838, indicating that all models correctly classify most cases, though their ability to detect positive outcomes differs considerably.
The Tuned Linear SVM achieved the highest accuracy at 0.838 with a sensitivity of 0.529, reflecting a moderate improvement over the default linear model (accuracy 0.820, sensitivity 0.471). The Radial SVM (default) performed worst in identifying positive cases, with a sensitivity of only 0.157 despite a high accuracy of 0.794 and very strong specificity (0.977). The Tuned Radial SVM improved slightly in sensitivity to 0.353 while maintaining high specificity (0.955) and an accuracy of 0.820, but it still underperformed compared to the linear models in detecting positive outcomes.
All models maintained high specificity (0.921–0.955), showing that negative cases were consistently identified correctly. The Radial SVM (default) and Tuned Radial SVM, in particular, demonstrated very strong precision for negative predictions, even though they struggled with positive case detection.
Overall, the linear SVM models, especially the tuned version, provide the best balance between accuracy and sensitivity, making them more effective when both positive and negative cases need to be identified reliably. The tuned radial SVM performed reasonably well but did not surpass the linear models, suggesting that the dataset does not contain strong non-linear patterns that the radial kernel could exploit.
Compared to previous experiments, Random Forest models achieved slightly higher accuracy (around 0.88) and better sensitivity, making them more suitable for general prediction tasks. Nevertheless, for applications where maintaining high specificity is important, the tuned radial SVM remains a reasonable alternative.
Review Of Articles
Demonstrate provided articles were read by, drawing insights, summarizing articles or via comparison:
The two articles present analyses which demonstrate how decision tree ensemble methods can predict Covid-19 infections from lab data by handling imbalanced datasets and emphasizing correct machine learning techniques and evaluation metrics. The study demonstrates the effectiveness of ensemble methods for imbalanced datasets and shows age as a critical factor in prediction models. The two articles acknowledge the challenge presented by imbalanced datasets in Covid-19 infection prediction. The first dataset contains 600 patient samples which demonstrates a 1:6.5.
The second dataset includes 5644 patients where positive cases account for approximately 10%. Class imbalance leads to biased models necessitating special techniques for correction. These methods demonstrate robust operation and accurate outcomes when applied to unbalanced datasets. The evaluation metrics accuracy, precision, recall, F1-measure, AUC-ROC and AUPRC were employed in both studies and results demonstrate that classifiers designed for imbalanced data sets achieve superior outcomes. Balanced random forest (RUS) outperformed other methods according to AUPRC metrics while RUSBagging yielded superior AUROC results. Merging age information with laboratory test data enhances predictive accuracy. Studies failed to achieve high accuracy estimates because they ignored age as a significant factor.
This analytical piece reviews Support Vector Machines (SVM) and Decision Trees by examining their methods and benefits while addressing their challenges and practical use cases in supervised learning. Support Vector Machines perform well in high-dimensional data spaces and provide strong resistance to overfitting whereas Decision Trees provide clear interpretability and user-friendly application despite being susceptible to overfitting. The discussion presents performance comparisons along with contextual application significance while emphasizing the vital need for informed algorithm selection in the evolving artificial intelligence domain.
https://www.coursera.org/articles/difference-between-svm-and-decision-tree
This article examines how Support Vector Machines (SVMs) and decision trees function as machine learning models for data classification and describes their respective mechanisms while assessing their benefits and challenges and practical applications. Support Vector Machines function well in spaces with many dimensions and offer versatility through various kernel functions whereas decision trees provide easy comprehension alongside flexibility with diverse data types and can be applied to classification and regression problems. The selection process between SVMs and decision trees should be based on the specific requirements of a project and its intended application.
https://scialert.net/fulltext/?doi=itj.2009.64.70
The study compares how well Support Vector Machines (SVM) and Decision Tree (DT) methods classify satellite imagery data from Langkawi Island in terms of accuracy. In this image classification task the SVM Radial Basis function demonstrated superior performance with overall accuracy of 76.0004% compared to Decision Tree method which achieved 68.7846%.
Researchers implemented Decision Tree (DT) and Support Vector Machine (SVM) algorithms to analyze SPOT 5 satellite imagery. The development of DT rules was carried out manually through analysis of Normalized Difference Vegetation Index (NDVI) and Brightness Value (BV) variables. The SVM method was implemented automatically using four kernel types: linear, polynomial, radial basis function, and sigmoid.
Conclusion
Across all experiments, SVM models delivered consistent and reliable performance but did not surpass Random Forest models from the previous homework. Linear and radial kernels performed similarly, suggesting limited nonlinear structure in the data. Literature comparisons confirm that SVMs generally excel in high-dimensional and complex feature spaces, while Decision Trees and Random Forests remain strong choices for interpretability and simplicity. In this dataset, Random Forest remains the recommended algorithm for maximizing accuracy, while SVM provides a strong, stable alternative with balanced sensitivity and specificity.
Final Conclusion & Recommendations
The SVM analysis demonstrated strong and consistent performance across both linear and radial kernels, achieving accuracy comparable to the Random Forest model from Homework #2. While tuning slightly improved sensitivity and specificity, overall accuracy remained stable, suggesting the dataset is largely linearly separable. Random Forest maintained a small edge in predictive power (≈0.85 vs. 0.84 for SVM), but SVM showed a better balance between false positives and false negatives. Findings from the reviewed literature align with these results. Across studies, SVMs tend to outperform Decision Trees in high-dimensional and nonlinear settings, while Decision Trees remain more interpretable and easier to deploy. In the context of this project’s application area, where reliable prediction accuracy is key, SVM represents a strong, stable alternative to tree-based models. For operational use, combining SVM’s precision with the interpretability of Decision Trees or Random Forests could yield the most practical balance.