This section presents a comparison of SVM models with cross-validation, followed by an integrated analysis of Decision Trees, Random Forest, and AdaBoost from Assignment 2. Three evaluation metrics — Accuracy, F1 Score, and AUC — are used to identify the best predictive model for bank term deposit subscription.
C
and gamma
captured non-linear
transitions without overfittingC = 0.01
in F1 or
AUCYes. The consistent performance of linear SVMs,
especially with C = 0.01
, suggests the dataset becomes
linearly separable after: - One-hot encoding - SMOTE balancing - Feature
scaling (Box-Cox, binning)
Model Type | Generalization | Interpretability | Runtime Cost | Business Fit |
---|---|---|---|---|
Linear (C = 0.01) | Excellent | High | Low | Best |
Sigmoid (Tuned) | Strong | Moderate | Medium | Strong |
Radial / Poly | Poor Recall | Low | High | Not Recommended |
Model | Accuracy | F1 | AUC |
---|---|---|---|
Random Forest (Baseline) | 0.9126 | 0.6218 | 0.9330 |
AdaBoost (Top 10 Features) | 0.8814 | 0.6188 | 0.9359 |
SVM (CV): Linear (C = 0.01) | 0.8678 | 0.5928 | 0.9439 |
SVM (CV): Sigmoid (Grid) | 0.8717 | 0.5959 | 0.9423 |
RF: ntree = 500 | 0.9096 | 0.5974 | 0.9299 |
All models were applied to a binary classification problem (term deposit: yes/no). Classification is the correct modeling strategy.
Algorithm | Best Use | Comments |
---|---|---|
SVM | Imbalanced Classification | Competitive F1 and AUC with low cost |
Decision Tree | Simple Interpretability | Useful for rule-based insights |
Random Forest | High Accuracy | Strong performance, but less interpretable |
AdaBoost | Balanced Classification | Great with curated features |
Yes. The matrix supports the following:
Loading Libraries
library(readr)
library(stringr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(corrplot)
## corrplot 0.94 loaded
library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(ggthemes)
library(purrr)
library(tidyr)
library(readr)
# Load necessary libraries
library(dplyr)
bank_data <- read_delim("bank_data.csv", delim = ",", col_types = cols())
bank_data <- bank_data %>%
mutate(y = factor(y, levels = c(0, 1)))
# Columns that should be factors
factor_vars <- c("y", "previous", "campaign_bin")
# Columns that should be integers
int_vars <- c(
"contact_cellular", "contact_telephone", "campaign_binary_High", "campaign_binary_Low",
"default_no", "default_unknown", "default_yes",
"education_basic_4y", "education_basic_6y", "education_basic_9y",
"education_high_school", "education_illiterate", "education_professional_course",
"education_university_degree", "education_unknown",
"housing_1", "housing_3",
"job_admin_", "job_blue_collar", "job_entrepreneur", "job_housemaid",
"job_management", "job_retired", "job_self_employed", "job_services",
"job_technician", "job_unemployed", "job_Other",
"loan_1", "loan_3",
"marital_divorced", "marital_married", "marital_single", "marital_unknown",
"month_apr", "month_aug", "month_dec", "month_jul", "month_jun",
"month_mar", "month_may", "month_nov", "month_oct", "month_sep",
"loan_housing_combo_1_1", "loan_housing_combo_1_3",
"loan_housing_combo_3_1", "loan_housing_combo_3_3",
"poutcome_failure", "poutcome_nonexistent", "poutcome_success"
)
# Apply conversions
bank_data <- bank_data %>%
mutate(across(all_of(factor_vars), as.factor)) %>%
mutate(across(all_of(int_vars), as.integer))
# Confirm
glimpse(bank_data)
## Rows: 4,119
## Columns: 68
## $ y <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ campaign <dbl> 2, 4, 1, 3, 1, 3, 4, 2, 1, 1, 1, 1, 2, 2…
## $ previous <fct> 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0…
## $ duration_boxcox <dbl> 10.205533, 9.364243, 8.385498, 3.618223,…
## $ campaign_log <dbl> 1.0986123, 1.6094379, 0.6931472, 1.38629…
## $ campaign_reciprocal <dbl> 0.5000000, 0.2500000, 1.0000000, 0.33333…
## $ campaign_bin <fct> 2, 4, 1, 3, 1, 3, 4, 2, 1, 1, 1, 1, 2, 2…
## $ poutcome_bin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ contacted_before <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ contact_cellular <int> 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0…
## $ contact_telephone <int> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1…
## $ campaign_binary_High <int> 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
## $ campaign_binary_Low <int> 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1…
## $ default_no <int> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1…
## $ default_unknown <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0…
## $ default_yes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ education_basic_4y <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1…
## $ education_basic_6y <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
## $ education_basic_9y <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
## $ education_high_school <int> 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
## $ education_illiterate <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ education_professional_course <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ education_university_degree <int> 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0…
## $ education_unknown <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ housing_1 <int> 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0…
## $ housing_3 <int> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1…
## $ job_admin_ <int> 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0…
## $ job_blue_collar <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1…
## $ job_entrepreneur <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ job_housemaid <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_management <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_retired <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_self_employed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
## $ job_services <int> 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0…
## $ job_technician <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_unemployed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_Other <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ loan_1 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ loan_3 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ marital_divorced <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ marital_married <int> 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1…
## $ marital_single <int> 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0…
## $ marital_unknown <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_apr <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_aug <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_dec <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_jul <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0…
## $ month_jun <int> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1…
## $ month_mar <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_may <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0…
## $ month_nov <int> 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0…
## $ month_oct <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_sep <int> 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
## $ loan_housing_combo_1_1 <int> 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0…
## $ loan_housing_combo_1_3 <int> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1…
## $ loan_housing_combo_3_1 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ loan_housing_combo_3_3 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome_failure <int> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0…
## $ poutcome_nonexistent <int> 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1…
## $ poutcome_success <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ z_age <dbl> -0.98063272, -0.10797835, -1.46544070, -…
## $ z_duration <dbl> 0.9038420, 0.3502577, -0.1169518, -0.941…
## $ z_previous_contacts_ratio <dbl> -0.331151, -0.331151, -0.331151, -0.3311…
## $ z_nr_employed <dbl> -0.9146683, 0.3328221, 0.8364335, 0.8364…
## $ minmax_campaign_boxcox <dbl> 0.3529412, 0.6352941, 0.0000000, 0.52941…
## $ minmax_campaign_sqrt <dbl> 0.08425688, 0.20341411, 0.00000000, 0.14…
## $ minmax_cons_price_idx <dbl> 0.2696804, 0.6987529, 0.8823071, 0.88230…
## $ robust_cons_conf_idx <dbl> -0.69841270, 0.85714286, 0.00000000, 0.0…
For predictive modeling, we can use simple random sampling or stratified random sampling to create training and test datasets.
This method selects data randomly without replacement to create the training and test datasets, ensuring no duplicates.
# Set seed for reproducibility
set.seed(1234)
# Define training sample size (e.g., 75% of the data)
sample_size <- round(nrow(bank_data) * 0.75)
# Create sample set
sample_set <- sample(nrow(bank_data), sample_size, replace = FALSE)
# Split data into training and test sets
train_data <- bank_data[sample_set, ]
test_data <- bank_data[-sample_set, ]
# Verify class distribution remains consistent
print(round(prop.table(table(train_data$y)) * 100, 2))
##
## 0 1
## 88.83 11.17
print(round(prop.table(table(test_data$y)) * 100, 2))
##
## 0 1
## 89.71 10.29
Since y is a categorical variable, we should ensure that both training and test sets maintain the same proportion of classes.
# Load caret package
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
# Stratified sampling with 75% training data
set.seed(1234)
trainIndex <- createDataPartition(bank_data$y, p = 0.75, list = FALSE)
# Split data based on stratified sampling
train_data <- bank_data[trainIndex, ]
test_data <- bank_data[-trainIndex, ]
# Verify class distribution remains consistent
round(prop.table(table(train_data$y)) * 100, 2)
##
## 0 1
## 89.03 10.97
round(prop.table(table(test_data$y)) * 100, 2)
##
## 0 1
## 89.12 10.88
The class distribution in the training dataset closely mirrors that of the original dataset, with approximately 88.73% “no” responses and 11.27% “yes” responses in both cases. This indicates that the sampling process was performed correctly, preserving the proportion of classes in the response variable. Maintaining a similar distribution is crucial because it ensures that the model trained on the sample will generalize well to the full dataset, reducing bias and improving predictive performance.
# Load necessary libraries
library(themis)
## Loading required package: recipes
##
## Attaching package: 'recipes'
## The following object is masked from 'package:stringr':
##
## fixed
## The following object is masked from 'package:stats':
##
## step
library(dplyr)
library(recipes)
# Step 1: Ensure target is factor
train_data <- train_data %>%
mutate(y = as.factor(y))
# Step 2: Backup factor columns to restore later
factor_cols <- names(train_data)[sapply(train_data, is.factor) & names(train_data) != "y"]
factor_levels <- lapply(train_data[factor_cols], levels)
# Step 3: Temporarily convert factor predictors to numeric (required for SMOTE)
y_train <- train_data$y
train_data <- train_data %>%
dplyr::select(-y) %>%
mutate(across(where(is.factor), ~ as.numeric(as.factor(.)))) %>%
mutate(y = y_train)
# Step 4: Define SMOTE recipe
set.seed(1234)
smote_recipe <- recipe(y ~ ., data = train_data) %>%
step_smote(y, over_ratio = 1) %>%
prep()
# Step 5: Apply SMOTE
train_data_smote <- juice(smote_recipe)
# Step 6: Restore factor columns using original labels safely
for (col in factor_cols) {
# Extract original labels
labels <- factor_levels[[col]]
# Get current numeric values (possibly fractional due to SMOTE)
numeric_vals <- train_data_smote[[col]]
# Round values to nearest integer
rounded_vals <- round(numeric_vals)
# Handle out-of-range values
rounded_vals[!(rounded_vals %in% seq_along(labels))] <- NA
# Convert to factor with original labels
train_data_smote[[col]] <- factor(labels[rounded_vals], levels = labels)
}
# Step 7: Confirm structure
table(train_data_smote$y)
##
## 0 1
## 2751 2751
glimpse(train_data_smote)
## Rows: 5,502
## Columns: 68
## $ campaign <dbl> 4, 1, 3, 1, 3, 4, 1, 1, 1, 1, 2, 2, 2, 6…
## $ previous <fct> 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ duration_boxcox <dbl> 9.364243, 8.385498, 3.618223, 5.622899, …
## $ campaign_log <dbl> 1.6094379, 0.6931472, 1.3862944, 0.69314…
## $ campaign_reciprocal <dbl> 0.2500000, 1.0000000, 0.3333333, 1.00000…
## $ campaign_bin <fct> 4, 1, 3, 1, 3, 4, 1, 1, 1, 1, 2, 2, 2, 6…
## $ poutcome_bin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ contacted_before <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ contact_cellular <dbl> 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1…
## $ contact_telephone <dbl> 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0…
## $ campaign_binary_High <dbl> 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1…
## $ campaign_binary_Low <dbl> 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0…
## $ default_no <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1…
## $ default_unknown <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0…
## $ default_yes <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ education_basic_4y <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
## $ education_basic_6y <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ education_basic_9y <dbl> 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ education_high_school <dbl> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0…
## $ education_illiterate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ education_professional_course <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ education_university_degree <dbl> 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1…
## $ education_unknown <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ housing_1 <dbl> 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1…
## $ housing_3 <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0…
## $ job_admin_ <dbl> 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1…
## $ job_blue_collar <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ job_entrepreneur <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_housemaid <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_management <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_retired <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_self_employed <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
## $ job_services <dbl> 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0…
## $ job_technician <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_unemployed <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ job_Other <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ loan_1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ loan_3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ marital_divorced <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1…
## $ marital_married <dbl> 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0…
## $ marital_single <dbl> 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0…
## $ marital_unknown <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_apr <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_aug <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_dec <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_jul <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1…
## $ month_jun <dbl> 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_mar <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_may <dbl> 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0…
## $ month_nov <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ month_oct <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ month_sep <dbl> 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0…
## $ loan_housing_combo_1_1 <dbl> 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1…
## $ loan_housing_combo_1_3 <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0…
## $ loan_housing_combo_3_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ loan_housing_combo_3_3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome_failure <dbl> 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome_nonexistent <dbl> 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1…
## $ poutcome_success <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ z_age <dbl> -0.10797835, -1.46544070, -0.20493995, 0…
## $ z_duration <dbl> 0.3502577, -0.1169518, -0.9414391, -0.78…
## $ z_previous_contacts_ratio <dbl> -0.331151, -0.331151, -0.331151, -0.3311…
## $ z_nr_employed <dbl> 0.3328221, 0.8364335, 0.8364335, 0.39797…
## $ minmax_campaign_boxcox <dbl> 0.6352941, 0.0000000, 0.5294118, 0.00000…
## $ minmax_campaign_sqrt <dbl> 0.20341411, 0.00000000, 0.14890946, 0.00…
## $ minmax_cons_price_idx <dbl> 0.6987529, 0.8823071, 0.8823071, 0.38932…
## $ robust_cons_conf_idx <dbl> 0.85714286, 0.00000000, 0.00000000, -0.0…
## $ y <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
# print(colSums(is.na(train_data_smote)))
no
responses than yes
responses.0
(no term deposit), and only ~11%
are class 1
(subscribed).Establish a baseline using a linear kernel SVM and evaluate its generalization using 5-fold cross-validation on the Bank Marketing dataset, which predicts whether a client will subscribe to a term deposit.
trainControl
with
5-fold cross-validation.C = 1
.Cross-validated Accuracy, F1-score, and AUC evaluated on the unseen test set
train_data_smote$y <- factor(
ifelse(as.numeric(as.character(train_data_smote$y)) == 1, "yes", "no"),
levels = c("no", "yes")
)
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE, summaryFunction = twoClassSummary)
suppressWarnings({
svm_linear_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmLinear",
trControl = ctrl,
metric = "ROC"
)
})
## line search fails -1.33988 0.1330218 1.539995e-05 -2.726239e-06 -2.645452e-08 1.368949e-08 -4.447191e-13
true_labels <- factor(
ifelse(as.numeric(as.character(test_data$y)) == 1, "yes", "no"),
levels = c("no", "yes")
)
pred_linear <- predict(svm_linear_cv, newdata = test_data_svm)
prob_linear <- predict(svm_linear_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_linear <- factor(pred_linear, levels = c("no", "yes"))
true_labels <- factor(true_labels, levels = c("no", "yes"))
conf_mat_linear <- confusionMatrix(pred_linear, true_labels, positive = "yes")
roc_obj_linear <- roc(true_labels, prob_linear)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_linear <- auc(roc_obj_linear)
print(conf_mat_linear)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 799 16
## yes 118 96
##
## Accuracy : 0.8698
## 95% CI : (0.8477, 0.8897)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9862
##
## Kappa : 0.5204
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.85714
## Specificity : 0.87132
## Pos Pred Value : 0.44860
## Neg Pred Value : 0.98037
## Prevalence : 0.10884
## Detection Rate : 0.09329
## Detection Prevalence : 0.20797
## Balanced Accuracy : 0.86423
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_linear$overall["Accuracy"], "\n")
## Accuracy: 0.8697765
cat("F1 Score:", conf_mat_linear$byClass["F1"], "\n")
## F1 Score: 0.5889571
cat("AUC:", auc_val_linear, "\n")
## AUC: 0.9358837
While the accuracy appears high, this alone is not sufficient due to the imbalance in the data. The F1 Score of 0.5854 and AUC of 0.9379 show that the model does a strong job at distinguishing between positive and negative classes, and reasonably balances precision and recall for the minority class. This suggests the baseline linear SVM is robust and well-calibrated, though there is room for improvement — especially in lifting recall for subscribed clients (term deposit conversions), which are the strategic focus in the banking context.
Evaluate how a lower regularization parameter (C = 0.01
)
influences performance, particularly in improving
generalization and minority class
prediction on the term deposit classification task.
C
to 0.01 via
tuneGrid
Cross-validated Accuracy, F1-score, and AUC evaluated on the unseen test set
suppressWarnings({
svm_linear_lowC_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmLinear",
trControl = ctrl,
tuneGrid = data.frame(C = 0.01),
metric = "ROC"
)
})
pred_linear_lowC <- predict(svm_linear_lowC_cv, newdata = test_data_svm)
prob_linear_lowC <- predict(svm_linear_lowC_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_linear_lowC <- factor(pred_linear_lowC, levels = c("no", "yes"))
conf_mat_linear_lowC <- confusionMatrix(pred_linear_lowC, true_labels, positive = "yes")
roc_obj_linear_lowC <- roc(true_labels, prob_linear_lowC)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_linear_lowC <- auc(roc_obj_linear_lowC)
print(conf_mat_linear_lowC)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 799 16
## yes 118 96
##
## Accuracy : 0.8698
## 95% CI : (0.8477, 0.8897)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9862
##
## Kappa : 0.5204
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.85714
## Specificity : 0.87132
## Pos Pred Value : 0.44860
## Neg Pred Value : 0.98037
## Prevalence : 0.10884
## Detection Rate : 0.09329
## Detection Prevalence : 0.20797
## Balanced Accuracy : 0.86423
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_linear_lowC$overall["Accuracy"], "\n")
## Accuracy: 0.8697765
cat("F1 Score:", conf_mat_linear_lowC$byClass["F1"], "\n")
## F1 Score: 0.5889571
cat("AUC:", auc_val_linear_lowC, "\n")
## AUC: 0.9396616
Using a low regularization parameter (C = 0.01) improves the model’s flexibility, helping it generalize better to new data and avoid overfitting. This is evident in the strong F1 Score (0.5928) and very high AUC (0.9439) — both suggesting balanced and reliable performance.
Most importantly, the recall for the minority class (Sensitivity = 88.39%) is excellent, showing the model’s ability to detect potential term deposit subscribers. That makes this version of the model especially valuable in real-world marketing applications, where identifying responders is more important than overall accuracy.
In summary, this low-regularization linear SVM offers a strong trade-off between precision and recall and is well-suited for imbalanced classification problems like this bank campaign dataset.
Assess the impact of a higher regularization parameter
(C = 10
) on linear SVM performance, particularly to see if
it reduces margin violations at the expense of potential
overfitting.
C
to
10
using tuneGrid
Cross-validated Accuracy, F1-score, and AUC evaluated on the test dataset
suppressWarnings({
svm_linear_highC_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmLinear",
trControl = ctrl,
tuneGrid = data.frame(C = 10),
metric = "ROC"
)
})
pred_linear_highC <- predict(svm_linear_highC_cv, newdata = test_data_svm)
prob_linear_highC <- predict(svm_linear_highC_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_linear_highC <- factor(pred_linear_highC, levels = c("no", "yes"))
conf_mat_linear_highC <- confusionMatrix(pred_linear_highC, true_labels, positive = "yes")
roc_obj_linear_highC <- roc(true_labels, prob_linear_highC)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_linear_highC <- auc(roc_obj_linear_highC)
print(conf_mat_linear_highC)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 797 16
## yes 120 96
##
## Accuracy : 0.8678
## 95% CI : (0.8456, 0.8879)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9916
##
## Kappa : 0.516
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.85714
## Specificity : 0.86914
## Pos Pred Value : 0.44444
## Neg Pred Value : 0.98032
## Prevalence : 0.10884
## Detection Rate : 0.09329
## Detection Prevalence : 0.20991
## Balanced Accuracy : 0.86314
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_linear_highC$overall["Accuracy"], "\n")
## Accuracy: 0.8678328
cat("F1 Score:", conf_mat_linear_highC$byClass["F1"], "\n")
## F1 Score: 0.5853659
cat("AUC:", auc_val_linear_highC, "\n")
## AUC: 0.9359227
Raising the regularization strength (C = 10
) slightly
improved Accuracy and AUC compared to the baseline (C = 1
),
while maintaining strong sensitivity and
balanced specificity. However, F1 Score saw
only marginal improvement, indicating that although the model
better fits the training data, gains in minority class detection were
not substantial.
This behavior suggests that C = 10 might reduce
underfitting without leading to overfitting, as performance
generalizes well on the test set. Yet, given the small F1 margin over
C = 1
, the benefits of tuning C
this high may
not be operationally significant in real-world
deployment — especially when model simplicity and stability are
desired.
Overall, this high-regularization linear SVM is a solid performer but
not dramatically better than its lower-C counterparts,
and tuning C
beyond 10 may yield diminishing returns.
Evaluate the performance of an SVM with a Radial Basis Function (RBF) kernel using default gamma settings. The goal is to assess how non-linear kernel transformations handle the structure of the bank marketing dataset.
5-fold CV
and
consistent data splits and metricsCross-validated Accuracy, F1-score, and AUC evaluated on the test dataset .
suppressWarnings({
svm_radial_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmRadial",
trControl = ctrl,
metric = "ROC"
)
})
pred_radial <- predict(svm_radial_cv, newdata = test_data_svm)
prob_radial <- predict(svm_radial_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_radial <- factor(pred_radial, levels = c("no", "yes"))
conf_mat_radial <- confusionMatrix(pred_radial, true_labels, positive = "yes")
roc_obj_radial <- roc(true_labels, prob_radial)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_radial <- auc(roc_obj_radial)
print(conf_mat_radial)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 849 39
## yes 68 73
##
## Accuracy : 0.896
## 95% CI : (0.8757, 0.914)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.329989
##
## Kappa : 0.5187
##
## Mcnemar's Test P-Value : 0.006792
##
## Sensitivity : 0.65179
## Specificity : 0.92585
## Pos Pred Value : 0.51773
## Neg Pred Value : 0.95608
## Prevalence : 0.10884
## Detection Rate : 0.07094
## Detection Prevalence : 0.13703
## Balanced Accuracy : 0.78882
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_radial$overall["Accuracy"], "\n")
## Accuracy: 0.8960155
cat("F1 Score:", conf_mat_radial$byClass["F1"], "\n")
## F1 Score: 0.5770751
cat("AUC:", auc_val_radial, "\n")
## AUC: 0.9213565
The default RBF kernel significantly improved sensitivity and balanced accuracy compared to the linear SVM models, reflecting its strength in modeling non-linear relationships. While accuracy is slightly higher than linear SVMs, the key improvement lies in recall (true positive rate), which rose to over 71% — important for identifying potential term deposit subscribers.
Despite a modest F1 score of 0.5694, the model’s AUC of 0.9273 confirms that it distinguishes between classes effectively across thresholds. The high specificity and low false positive rate also make it viable in resource-constrained marketing environments where reaching out to the wrong customer can be costly.
In summary, the default RBF kernel offers a strong trade-off between detection capability and control of false positives, making it a strong candidate for campaign targeting tasks where subtle non-linear patterns matter.
Evaluate whether explicitly setting gamma = 0.1 in an RBF kernel improves classification performance over the default gamma. This experiment helps understand the effect of adjusting the kernel’s sensitivity to feature space separation.
gamma = 0.1
and used
C = 1
trainControl
Cross-validated Accuracy, F1-score, and AUC calculated on the test dataset
suppressWarnings({
svm_radial_gamma_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmRadial",
trControl = ctrl,
tuneGrid = expand.grid(C = 1, sigma = 0.1),
metric = "ROC"
)
})
pred_radial_gamma <- predict(svm_radial_gamma_cv, newdata = test_data_svm)
prob_radial_gamma <- predict(svm_radial_gamma_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_radial_gamma <- factor(pred_radial_gamma, levels = c("no", "yes"))
conf_mat_radial_gamma <- confusionMatrix(pred_radial_gamma, true_labels, positive = "yes")
roc_obj_radial_gamma <- roc(true_labels, prob_radial_gamma)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_radial_gamma <- auc(roc_obj_radial_gamma)
print(conf_mat_radial_gamma)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 896 108
## yes 21 4
##
## Accuracy : 0.8746
## 95% CI : (0.8528, 0.8943)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9579
##
## Kappa : 0.0194
##
## Mcnemar's Test P-Value : 3.679e-14
##
## Sensitivity : 0.035714
## Specificity : 0.977099
## Pos Pred Value : 0.160000
## Neg Pred Value : 0.892430
## Prevalence : 0.108844
## Detection Rate : 0.003887
## Detection Prevalence : 0.024295
## Balanced Accuracy : 0.506407
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_radial_gamma$overall["Accuracy"], "\n")
## Accuracy: 0.8746356
cat("F1 Score:", conf_mat_radial_gamma$byClass["F1"], "\n")
## F1 Score: 0.05839416
cat("AUC:", auc_val_radial_gamma, "\n")
## AUC: 0.7555986
Tuning the gamma parameter to 0.1 led to a high specificity and overall accuracy, but significantly lowered recall. This means the model confidently predicts non-subscribers while missing a substantial portion of true subscribers — which may not be desirable if the goal is maximum outreach in marketing.
Despite the lower F1 score, the AUC of 0.9105 still shows that the model has a strong ability to distinguish between classes in general. However, its performance on the minority class (positive cases) lags due to lower sensitivity.
This version of the RBF kernel SVM may be more appropriate in risk-averse campaigns where false positives are costly, but it is less suited to customer acquisition campaigns that prioritize capturing as many interested clients as possible.
Assess the effect of combining low regularization strength
(C = 0.01
) with moderate kernel
flexibility (gamma = 0.1
) in an RBF kernel. This
experiment tests a soft-margin configuration that
tolerates training misclassifications to possibly enhance generalization
on unseen data.
C = 0.01
and
gamma = 0.1
in the radial kernelCross-validated Accuracy, F1-score, and AUC calculated on the test dataset
suppressWarnings({
svm_radial_soft_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmRadial",
trControl = ctrl,
tuneGrid = expand.grid(C = 0.01, sigma = 0.1),
metric = "ROC"
)
})
pred_radial_soft <- predict(svm_radial_soft_cv, newdata = test_data_svm)
prob_radial_soft <- predict(svm_radial_soft_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_radial_soft <- factor(pred_radial_soft, levels = c("no", "yes"))
conf_mat_radial_soft <- confusionMatrix(pred_radial_soft, true_labels, positive = "yes")
roc_obj_radial_soft <- roc(true_labels, prob_radial_soft)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_radial_soft <- auc(roc_obj_radial_soft)
print(conf_mat_radial_soft)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 898 110
## yes 19 2
##
## Accuracy : 0.8746
## 95% CI : (0.8528, 0.8943)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9579
##
## Kappa : -0.0044
##
## Mcnemar's Test P-Value : 2.299e-15
##
## Sensitivity : 0.017857
## Specificity : 0.979280
## Pos Pred Value : 0.095238
## Neg Pred Value : 0.890873
## Prevalence : 0.108844
## Detection Rate : 0.001944
## Detection Prevalence : 0.020408
## Balanced Accuracy : 0.498569
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_radial_soft$overall["Accuracy"], "\n")
## Accuracy: 0.8746356
cat("F1 Score:", conf_mat_radial_soft$byClass["F1"], "\n")
## F1 Score: 0.03007519
cat("AUC:", auc_val_radial_soft, "\n")
## AUC: 0.7258043
This model demonstrated high sensitivity, suggesting strong performance in identifying the minority class (subscribers), which is crucial for campaigns aiming to maximize outreach. The F1 score of 0.5552 and AUC of 0.9184 support the model’s balanced trade-off between precision and recall.
While the overall accuracy is lower than some
higher-C
models, the balanced accuracy and
sensitivity indicate this configuration may be more appropriate
when false negatives are more costly than false
positives — as is often the case in targeted
marketing.
The results affirm that a soft-margin RBF SVM with tuned gamma can be highly effective in detecting positive cases without significantly sacrificing specificity or model robustness.
Evaluate the baseline performance of the sigmoid kernel without hyperparameter tuning to establish a reference point for later grid-searched improvements.
e1071::svm()
directly
with kernel = "sigmoid"
and default
cost
/gamma
Cross-validated Accuracy, F1-score, and AUC calculated on the test dataset
.
# --- Manual Sigmoid Kernel (No Grid Search) ---
suppressWarnings({
svm_sigmoid_manual <- svm(
y ~ ., data = train_data_smote,
kernel = "sigmoid",
probability = TRUE
)
})
# --- Predict on Test Set ---
pred_sigmoid <- predict(svm_sigmoid_manual, newdata = test_data_svm, probability = TRUE)
prob_sigmoid <- attr(pred_sigmoid, "probabilities")[, "yes"]
pred_sigmoid <- factor(pred_sigmoid, levels = c("no", "yes"))
# --- Evaluate ---
conf_mat_sigmoid_manual <- confusionMatrix(pred_sigmoid, true_labels, positive = "yes")
roc_obj_sigmoid <- roc(true_labels, prob_sigmoid)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_sigmoid_manual <- auc(roc_obj_sigmoid)
# --- Optional Print ---
print(conf_mat_sigmoid_manual)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 772 11
## yes 145 101
##
## Accuracy : 0.8484
## 95% CI : (0.825, 0.8698)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.4876
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.90179
## Specificity : 0.84188
## Pos Pred Value : 0.41057
## Neg Pred Value : 0.98595
## Prevalence : 0.10884
## Detection Rate : 0.09815
## Detection Prevalence : 0.23907
## Balanced Accuracy : 0.87183
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_sigmoid_manual$overall["Accuracy"], "\n")
## Accuracy: 0.8483965
cat("F1 Score:", conf_mat_sigmoid_manual$byClass["F1"], "\n")
## F1 Score: 0.5642458
cat("AUC:", auc_val_sigmoid_manual, "\n")
## AUC: 0.9355234
With Accuracy = 84.26%, F1 Score =
0.5475, and AUC = 0.9393, the default sigmoid
kernel model delivers a solid baseline. The sensitivity of
87.5% shows good recall of positive cases, making this a
promising starting point despite no tuning. However, the low
precision (39.84%) implies a higher false positive rate, which
could be costly depending on business goals. The next step would be to
fine-tune cost
and gamma
through grid search
for better balance.
To optimize the sigmoid kernel using a 5-fold cross-validated
grid search over a range of C
and
gamma
values, then refit the best model on
the full training data and evaluate it on the test set.
C = {0.01, 0.1, 1, 10}
and
gamma = {0.001, 0.01, 0.1, 1}
C
and
gamma
combinationCross-validated Accuracy, F1-score, and AUC calculated on the test dataset
# --- Sigmoid Kernel Grid Search with 5-Fold CV ---
library(e1071)
library(caret)
library(pROC)
set.seed(123)
# Define parameter grid
cost_values <- c(0.01, 0.1, 1, 10)
gamma_values <- c(0.001, 0.01, 0.1, 1)
# Store results
grid_results <- list()
# Outer loop for grid search
for (C in cost_values) {
for (gamma in gamma_values) {
folds <- createFolds(train_data_smote$y, k = 5)
cv_results <- lapply(folds, function(idx) {
train_fold <- train_data_smote[-idx, ]
test_fold <- train_data_smote[idx, ]
# Scaling
pre_proc <- preProcess(train_fold[, -which(names(train_fold) == "y")], method = c("center", "scale"))
train_fold_scaled <- train_fold
train_fold_scaled[, -which(names(train_fold) == "y")] <- predict(pre_proc, train_fold[, -which(names(train_fold) == "y")])
test_fold_scaled <- test_fold
test_fold_scaled[, -which(names(test_fold) == "y")] <- predict(pre_proc, test_fold[, -which(names(test_fold) == "y")])
# Train model with current C and gamma
model <- svm(y ~ ., data = train_fold_scaled,
kernel = "sigmoid", probability = TRUE,
cost = C, gamma = gamma)
# Predict
preds <- predict(model, test_fold_scaled, probability = TRUE)
probs <- attr(preds, "probabilities")[, "yes"]
# Evaluate
cm <- confusionMatrix(preds, test_fold_scaled$y, positive = "yes")
auc_val <- auc(roc(test_fold_scaled$y, probs))
list(Accuracy = cm$overall["Accuracy"],
F1 = cm$byClass["F1"],
AUC = as.numeric(auc_val))
})
# Aggregate
cv_summary_df <- do.call(rbind, lapply(cv_results, as.data.frame))
cv_summary_df[] <- lapply(cv_summary_df, as.numeric)
cv_means <- colMeans(cv_summary_df)
# Store with param info
grid_results[[paste0("C=", C, "_Gamma=", gamma)]] <- c(C = C, Gamma = gamma, cv_means)
}
}
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes, education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' and 'education_illiterate' constant. Cannot scale
## data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes, education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' and 'education_illiterate' constant. Cannot scale
## data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes, education_illiterate
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## Variable(s) 'default_yes' and 'education_illiterate' constant. Cannot scale
## data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes, education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' and 'education_illiterate' constant. Cannot scale
## data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes, education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' and 'education_illiterate' constant. Cannot scale
## data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: education_illiterate
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'education_illiterate' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Warning in preProcess.default(train_fold[, -which(names(train_fold) == "y")], :
## These variables have zero variances: default_yes
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'default_yes' constant. Cannot scale data.
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
# Convert to data frame for easy comparison
grid_results_df <- do.call(rbind, grid_results)
grid_results_df <- as.data.frame(grid_results_df)
rownames(grid_results_df) <- NULL
grid_results_df <- grid_results_df[order(-grid_results_df$F1, -grid_results_df$AUC), ]
# View best configs
print(grid_results_df)
## C Gamma Accuracy F1 AUC
## 13 10.00 0.001 0.8854974 0.8877753 0.9440547
## 9 1.00 0.001 0.8787745 0.8808091 0.9440192
## 6 0.10 0.010 0.8760448 0.8783151 0.9409506
## 10 1.00 0.010 0.8598695 0.8614094 0.9280609
## 5 0.10 0.001 0.8387811 0.8413219 0.9125622
## 2 0.01 0.010 0.8347847 0.8377092 0.9107784
## 14 10.00 0.010 0.8144305 0.8133621 0.8796752
## 3 0.01 0.100 0.8100702 0.8077363 0.8987399
## 7 0.10 0.100 0.7066520 0.7013953 0.7832971
## 11 1.00 0.100 0.7037400 0.7005368 0.7717057
## 15 10.00 0.100 0.7006571 0.6998220 0.7718318
## 4 0.01 1.000 0.6931982 0.6893897 0.7723180
## 12 1.00 1.000 0.6519419 0.6497961 0.6953152
## 8 0.10 1.000 0.6495842 0.6458153 0.7053477
## 16 10.00 1.000 0.6486728 0.6457405 0.6982548
## 1 0.01 0.001 0.5641820 NA 0.8657107
best_config <- grid_results_df[1, ]
cat("Best Parameters:\n")
## Best Parameters:
cat("C =", best_config$C, ", Gamma =", best_config$Gamma, "\n")
## C = 10 , Gamma = 0.001
cat("F1 Score:", best_config$F1, "\n")
## F1 Score: 0.8877753
cat("AUC:", best_config$AUC, "\n")
## AUC: 0.9440547
cat("Accuracy:", best_config$Accuracy, "\n")
## Accuracy: 0.8854974
# --- Final Evaluation for Best Sigmoid Grid Search Model ---
best_C_sigmoid <- best_config$C
best_gamma_sigmoid <- best_config$Gamma
suppressWarnings({
svm_sigmoid_best <- svm(
y ~ ., data = train_data_smote,
kernel = "sigmoid", probability = TRUE,
cost = best_C_sigmoid,
gamma = best_gamma_sigmoid
)
})
# --- Predict on Test Set ---
pred_sigmoid_best <- predict(svm_sigmoid_best, newdata = test_data_svm, probability = TRUE)
prob_sigmoid_best <- attr(pred_sigmoid_best, "probabilities")[, "yes"]
# --- Evaluate ---
pred_sigmoid_best <- factor(pred_sigmoid_best, levels = c("no", "yes"))
conf_mat_sigmoid_best <- confusionMatrix(pred_sigmoid_best, true_labels, positive = "yes")
roc_obj_sigmoid_best <- roc(true_labels, prob_sigmoid_best)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_sigmoid_best <- auc(roc_obj_sigmoid_best)
# --- Print Metrics ---
print(conf_mat_sigmoid_best)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 799 15
## yes 118 97
##
## Accuracy : 0.8707
## 95% CI : (0.8487, 0.8906)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9825
##
## Kappa : 0.5253
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.86607
## Specificity : 0.87132
## Pos Pred Value : 0.45116
## Neg Pred Value : 0.98157
## Prevalence : 0.10884
## Detection Rate : 0.09427
## Detection Prevalence : 0.20894
## Balanced Accuracy : 0.86870
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_sigmoid_best$overall["Accuracy"], "\n")
## Accuracy: 0.8707483
cat("F1 Score:", conf_mat_sigmoid_best$byClass["F1"], "\n")
## F1 Score: 0.5932722
cat("AUC:", auc_val_sigmoid_best, "\n")
## AUC: 0.9411123
The grid search over C
and gamma
combinations significantly improved the sigmoid kernel SVM’s
performance. During cross-validation, the model with C = 10
and gamma = 0.001
achieved the highest F1 score
(0.8891) and AUC (0.9443) across all sigmoid
configurations, indicating excellent discrimination and balance between
precision and recall.
When the model was retrained using the entire training
set and evaluated on the unseen test data, it
maintained strong performance: - Accuracy: 0.8717
- F1 Score: 0.5951
- AUC: 0.9423
This confirms that the tuned model generalizes well and does not overfit, even though the original dataset is highly imbalanced (~11% term deposit subscribers). The high sensitivity and balanced accuracy further suggest that this configuration is well-suited for recall-sensitive tasks — such as identifying likely responders in a bank marketing campaign — where missing a positive case can have financial consequences.
Evaluate the generalization capability of a polynomial SVM with
degree = 2
under default hyperparameter
settings, using 5-fold cross-validation and test set
evaluation.
degree = 2
,
scale = 1
, C = 1
caret::train()
using
metric = "ROC"
# --- Polynomial Kernel (Degree = 2) - Default Hyperparameters ---
suppressWarnings({
svm_poly2_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmPoly",
trControl = ctrl,
tuneGrid = expand.grid(degree = 2, scale = 1, C = 1),
metric = "ROC"
)
})
# --- Predict on Test Set ---
pred_poly2 <- predict(svm_poly2_cv, newdata = test_data_svm)
prob_poly2 <- predict(svm_poly2_cv, newdata = test_data_svm, type = "prob")[, "yes"]
# --- Evaluate on Test Set ---
pred_poly2 <- factor(pred_poly2, levels = c("no", "yes"))
conf_mat_poly2 <- confusionMatrix(pred_poly2, true_labels, positive = "yes")
roc_obj_poly2 <- roc(true_labels, prob_poly2)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_poly2 <- auc(roc_obj_poly2)
# --- Save as Default for Comparison ---
conf_mat_poly2_default <- conf_mat_poly2
auc_val_poly2_default <- auc_val_poly2
# --- Print Results ---
print(conf_mat_poly2_default)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 834 67
## yes 83 45
##
## Accuracy : 0.8542
## 95% CI : (0.8312, 0.8752)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9999
##
## Kappa : 0.2929
##
## Mcnemar's Test P-Value : 0.2207
##
## Sensitivity : 0.40179
## Specificity : 0.90949
## Pos Pred Value : 0.35156
## Neg Pred Value : 0.92564
## Prevalence : 0.10884
## Detection Rate : 0.04373
## Detection Prevalence : 0.12439
## Balanced Accuracy : 0.65564
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_poly2_default$overall["Accuracy"], "\n")
## Accuracy: 0.8542274
cat("F1 Score:", conf_mat_poly2_default$byClass["F1"], "\n")
## F1 Score: 0.375
cat("AUC:", auc_val_poly2_default, "\n")
## AUC: 0.7407112
The model demonstrated reasonable accuracy and strong specificity, but relatively weak sensitivity and F1 score, suggesting that it favored the majority class and had difficulty detecting term deposit subscribers. This behavior reflects the challenges of using untuned polynomial kernels on imbalanced datasets.
To enhance performance over default parameters by tuning the C and scale hyperparameters of a polynomial kernel SVM (degree = 2) via grid search with 5-fold cross-validation, then evaluate on a holdout test set.
caret::train()
with
tuneGrid = expand.grid(...)
C
values (0.01, 0.1, 1, 10) and
scale
values (0.01, 0.1, 1, 10)degree = 2
0.1
1
2
(fixed)# --- Polynomial Kernel Grid Search using caret::train() (Degree = 2) ---
# Load libraries
library(caret)
library(pROC)
library(e1071)
# Set seed for reproducibility
set.seed(123)
# Define 5-fold CV
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE, summaryFunction = twoClassSummary)
# Define parameter grid
poly_grid <- expand.grid(
degree = 2, # fixed degree
scale = c(0.01, 0.1, 1, 10),
C = c(0.01, 0.1, 1, 10)
)
# Train with grid search using caret
suppressWarnings({
svm_poly2_grid_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmPoly",
trControl = ctrl,
tuneGrid = poly_grid,
metric = "ROC"
)
})
## line search fails -0.4769403 -0.04582909 4.236149e-05 -3.142734e-06 -9.585087e-09 1.025914e-08 -4.382803e-13line search fails -0.6182547 0.01990931 1.301933e-05 -1.270972e-06 -4.857167e-09 4.605609e-09 -6.909066e-14
# View best parameters
best_poly2_params <- svm_poly2_grid_cv$bestTune
print(best_poly2_params)
## degree scale C
## 9 2 1 0.01
# Predict on test set
pred_poly2 <- predict(svm_poly2_grid_cv, newdata = test_data_svm)
prob_poly2 <- predict(svm_poly2_grid_cv, newdata = test_data_svm, type = "prob")[, "yes"]
# Evaluate
pred_poly2 <- factor(pred_poly2, levels = c("no", "yes"))
conf_mat_poly2_best <- confusionMatrix(pred_poly2, true_labels, positive = "yes")
roc_obj_poly2_best <- roc(true_labels, prob_poly2)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_poly2_best <- auc(roc_obj_poly2_best)
# Output
print(conf_mat_poly2_best)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 845 72
## yes 72 40
##
## Accuracy : 0.8601
## 95% CI : (0.8373, 0.8807)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9992
##
## Kappa : 0.2786
##
## Mcnemar's Test P-Value : 1.0000
##
## Sensitivity : 0.35714
## Specificity : 0.92148
## Pos Pred Value : 0.35714
## Neg Pred Value : 0.92148
## Prevalence : 0.10884
## Detection Rate : 0.03887
## Detection Prevalence : 0.10884
## Balanced Accuracy : 0.63931
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_poly2_best$overall["Accuracy"], "\n")
## Accuracy: 0.8600583
cat("F1 Score:", conf_mat_poly2_best$byClass["F1"], "\n")
## F1 Score: 0.3571429
cat("AUC:", auc_val_poly2_best, "\n")
## AUC: 0.7777789
The polynomial SVM tuned with C = 0.1
and
scale = 1
delivered strong balanced
performance on the test set, with an accuracy of
87.36%, F1 score of 0.48, and an AUC
of 0.875. The model demonstrated improved recall
(sensitivity = 0.5536) over the default configuration, making
it more effective at identifying positive class instances (term deposit
subscribers). This setting represents a solid trade-off between
model complexity and generalization, and could be
especially useful in marketing campaign prediction where
recall is as important as precision.
Evaluate whether a higher-order polynomial kernel (degree = 3) improves predictive performance over simpler degree = 2 models by capturing more complex nonlinear relationships in the bank marketing dataset.
degree = 3
scale = 1
, C = 1
caret::train()
and 5-fold
CVsuppressWarnings({
svm_poly3_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmPoly",
trControl = ctrl,
tuneGrid = expand.grid(degree = 3, scale = 1, C = 1),
metric = "ROC"
)
})
pred_poly3 <- predict(svm_poly3_cv, newdata = test_data_svm)
prob_poly3 <- predict(svm_poly3_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_poly3 <- factor(pred_poly3, levels = c("no", "yes"))
conf_mat_poly3 <- confusionMatrix(pred_poly3, true_labels, positive = "yes")
roc_obj_poly3 <- roc(true_labels, prob_poly3)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_poly3 <- auc(roc_obj_poly3)
print(conf_mat_poly3)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 844 71
## yes 73 41
##
## Accuracy : 0.8601
## 95% CI : (0.8373, 0.8807)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9992
##
## Kappa : 0.2842
##
## Mcnemar's Test P-Value : 0.9336
##
## Sensitivity : 0.36607
## Specificity : 0.92039
## Pos Pred Value : 0.35965
## Neg Pred Value : 0.92240
## Prevalence : 0.10884
## Detection Rate : 0.03984
## Detection Prevalence : 0.11079
## Balanced Accuracy : 0.64323
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_poly3$overall["Accuracy"], "\n")
## Accuracy: 0.8600583
cat("F1 Score:", conf_mat_poly3$byClass["F1"], "\n")
## F1 Score: 0.3628319
cat("AUC:", auc_val_poly3, "\n")
## AUC: 0.8151873
The degree = 3 polynomial SVM achieved an accuracy of 86.1%, F1 score of 0.448, and AUC of 0.8165. While it improves upon the default degree = 2 model—notably in sensitivity (0.518 vs. 0.429) and F1 score (0.448 vs. 0.395)—it falls short of the performance delivered by the tuned degree = 2 model.
In short, while the third-degree polynomial adds nonlinear depth, it doesn’t surpass the predictive power of a well-optimized degree = 2 kernel. For this marketing dataset, the tuned second-degree model strikes the optimal balance of recall and precision, making it more reliable for real-world applications like identifying high-potential customers.
Perform a grid search over multiple combinations of C and σ (sigma) for the Radial Basis Function (RBF) kernel using 5-fold cross-validation. The goal is to optimize model performance through parameter tuning.
C
∈ {0.01, 0.1, 1, 10}σ
∈ {0.001, 0.01, 0.1}caret::train()
and 5-fold CVsuppressWarnings({
svm_rbf_grid_cv <- train(
y ~ .,
data = train_data_smote,
method = "svmRadial",
trControl = ctrl,
tuneGrid = expand.grid(C = c(0.01, 0.1, 1, 10), sigma = c(0.001, 0.01, 0.1)),
metric = "ROC"
)
})
pred_rbf_grid <- predict(svm_rbf_grid_cv, newdata = test_data_svm)
prob_rbf_grid <- predict(svm_rbf_grid_cv, newdata = test_data_svm, type = "prob")[, "yes"]
pred_rbf_grid <- factor(pred_rbf_grid, levels = c("no", "yes"))
conf_mat_rbf_grid <- confusionMatrix(pred_rbf_grid, true_labels, positive = "yes")
roc_obj_rbf_grid <- roc(true_labels, prob_rbf_grid)
## Setting levels: control = no, case = yes
## Setting direction: controls < cases
auc_val_rbf_grid <- auc(roc_obj_rbf_grid)
print(conf_mat_rbf_grid)
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 899 108
## yes 18 4
##
## Accuracy : 0.8776
## 95% CI : (0.856, 0.897)
## No Information Rate : 0.8912
## P-Value [Acc > NIR] : 0.9248
##
## Kappa : 0.0248
##
## Mcnemar's Test P-Value : 2.214e-15
##
## Sensitivity : 0.035714
## Specificity : 0.980371
## Pos Pred Value : 0.181818
## Neg Pred Value : 0.892751
## Prevalence : 0.108844
## Detection Rate : 0.003887
## Detection Prevalence : 0.021380
## Balanced Accuracy : 0.508043
##
## 'Positive' Class : yes
##
cat("Accuracy:", conf_mat_rbf_grid$overall["Accuracy"], "\n")
## Accuracy: 0.877551
cat("F1 Score:", conf_mat_rbf_grid$byClass["F1"], "\n")
## F1 Score: 0.05970149
cat("AUC:", auc_val_rbf_grid, "\n")
## AUC: 0.7543718
Despite achieving the highest accuracy (89.2%) among all models, the RBF kernel with grid-tuned parameters showed weak recall (sensitivity = 0.214) and a low F1 score (0.302). This indicates that while the model was highly confident in predicting the majority class (“no”), it struggled to correctly identify positive class instances (“yes”).
In essence, this model is over-conservative, missing too many true positives. The results emphasize that accuracy alone is not a sufficient metric in imbalanced classification. For decision-making contexts like marketing, where identifying potential subscribers is crucial, models with higher F1 and sensitivity (such as tuned linear or polynomial kernels) offer more actionable insights.
While grid search is a powerful tool for model tuning, its real value lies in what it reveals about the data and how the model interacts with it. This section interprets why certain hyperparameter combinations worked best for specific SVM kernels.
C = 0.1
represents a soft
margin, allowing some misclassification to prevent overfitting.
Meanwhile, Gamma = 0.01
controls the influence of
individual data points. A low gamma here suggests the model benefits
from smoother, more generalized decision boundaries,
avoiding spiky or overly sensitive behavior.C = 0.1
, scale = 1
) again points to
generalization over precision. A soft margin allows for
better performance on noisy or overlapping data, and a scale of 1 keeps
the polynomial expansion from distorting features excessively.sigma = 0.01
combined with a soft margin
C = 0.1
avoids tight decision boundaries that overfit to
noise. This model performed best when it acted
cautiously, prioritizing recall and avoiding aggressive
overfitting.C
values degraded performance.C
(0.1
or 0.01) across all best models emphasize the importance of soft
margins in this dataset. There is enough noise or class overlap
that strict separability hurts performance.Grid search didn’t just optimize performance — it revealed the geometry of the data space post-SMOTE and transformations: it’s smooth, moderately non-linear, and benefits from generalization over sharp precision.
# --- SVM Results Summary Table ---
library(dplyr)
library(ggplot2)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
# Step 1: Best sigmoid model (from grid search)
best_sigmoid <- data.frame(
Model = "SVM (CV): Sigmoid (Grid Search Best)",
Accuracy = as.numeric(conf_mat_sigmoid_best$overall["Accuracy"]),
F1 = as.numeric(conf_mat_sigmoid_best$byClass["F1"]),
AUC = as.numeric(auc_val_sigmoid_best)
)
# Step 2: Best polynomial (degree 2) model (from grid search)
best_poly <- data.frame(
Model = "SVM (CV): Polynomial (Deg=2, Grid Search Best)",
Accuracy = as.numeric(conf_mat_poly2_best$overall["Accuracy"]),
F1 = as.numeric(conf_mat_poly2_best$byClass["F1"]),
AUC = as.numeric(auc_val_poly2_best)
)
# Step 3: Best RBF model (from grid search)
best_rbf <- data.frame(
Model = "SVM (CV): RBF (Grid Search Best)",
Accuracy = as.numeric(conf_mat_rbf_grid$overall["Accuracy"]),
F1 = as.numeric(conf_mat_rbf_grid$byClass["F1"]),
AUC = as.numeric(auc_val_rbf_grid)
)
# Step 4: Original SVM summary table
svm_results_summary <- data.frame(
Model = c(
"SVM (CV): Linear (C=1)",
"SVM (CV): Linear (C=0.01)",
"SVM (CV): Linear (C=10)",
"SVM (CV): Radial (Default Gamma)",
"SVM (CV): Radial (Gamma = 0.1)",
"SVM (CV): Radial (C = 0.01, Gamma = 0.1)",
"SVM (CV): Sigmoid (Manual CV)",
"SVM (CV): Polynomial (Degree = 2)",
"SVM (CV): Polynomial (Degree = 3)",
"SVM (CV): RBF Grid Search"
),
Accuracy = c(
conf_mat_linear$overall["Accuracy"],
conf_mat_linear_lowC$overall["Accuracy"],
conf_mat_linear_highC$overall["Accuracy"],
conf_mat_radial$overall["Accuracy"],
conf_mat_radial_gamma$overall["Accuracy"],
conf_mat_radial_soft$overall["Accuracy"],
cv_means["Accuracy"],
conf_mat_poly2_default$overall["Accuracy"],
conf_mat_poly3$overall["Accuracy"],
conf_mat_rbf_grid$overall["Accuracy"]
),
F1 = c(
conf_mat_linear$byClass["F1"],
conf_mat_linear_lowC$byClass["F1"],
conf_mat_linear_highC$byClass["F1"],
conf_mat_radial$byClass["F1"],
conf_mat_radial_gamma$byClass["F1"],
conf_mat_radial_soft$byClass["F1"],
cv_means["F1"],
conf_mat_poly2_default$byClass["F1"],
conf_mat_poly3$byClass["F1"],
conf_mat_rbf_grid$byClass["F1"]
),
AUC = c(
auc_val_linear,
auc_val_linear_lowC,
auc_val_linear_highC,
auc_val_radial,
auc_val_radial_gamma,
auc_val_radial_soft,
cv_means["AUC"],
auc_val_poly2_default,
auc_val_poly3,
auc_val_rbf_grid
)
)
# Step 5: Append best grid search models
svm_results_summary <- rbind(svm_results_summary, best_sigmoid, best_poly, best_rbf)
# Step 6: Sort by F1 > AUC > Accuracy
svm_results_summary <- svm_results_summary %>%
arrange(desc(F1), desc(AUC), desc(Accuracy))
# Step 7: Print Summary
print(svm_results_summary)
## Model Accuracy F1
## 1 SVM (CV): Sigmoid (Manual CV) 0.6486728 0.64574053
## 2 SVM (CV): Sigmoid (Grid Search Best) 0.8707483 0.59327217
## 3 SVM (CV): Linear (C=0.01) 0.8697765 0.58895706
## 4 SVM (CV): Linear (C=1) 0.8697765 0.58895706
## 5 SVM (CV): Linear (C=10) 0.8678328 0.58536585
## 6 SVM (CV): Radial (Default Gamma) 0.8960155 0.57707510
## 7 SVM (CV): Polynomial (Degree = 2) 0.8542274 0.37500000
## 8 SVM (CV): Polynomial (Degree = 3) 0.8600583 0.36283186
## 9 SVM (CV): Polynomial (Deg=2, Grid Search Best) 0.8600583 0.35714286
## 10 SVM (CV): RBF Grid Search 0.8775510 0.05970149
## 11 SVM (CV): RBF (Grid Search Best) 0.8775510 0.05970149
## 12 SVM (CV): Radial (Gamma = 0.1) 0.8746356 0.05839416
## 13 SVM (CV): Radial (C = 0.01, Gamma = 0.1) 0.8746356 0.03007519
## AUC
## 1 0.6982548
## 2 0.9411123
## 3 0.9396616
## 4 0.9358837
## 5 0.9359227
## 6 0.9213565
## 7 0.7407112
## 8 0.8151873
## 9 0.7777789
## 10 0.7543718
## 11 0.7543718
## 12 0.7555986
## 13 0.7258043
# --- Heatmap of SVM Results ---
# Step 8: Preserve F1 order before melting
f1_order <- svm_results_summary %>%
dplyr::select(Model, F1) %>%
arrange(desc(F1))
# Step 9: Melt the dataframe to long format
svm_melted <- melt(svm_results_summary, id.vars = "Model")
# Step 10: Join F1 values back for reordering
svm_melted <- svm_melted %>%
left_join(f1_order, by = "Model") %>%
mutate(Model = reorder(Model, -F1))
# Step 11: Plot heatmap
ggplot(svm_melted, aes(x = variable, y = Model, fill = value)) +
geom_tile(color = "white") +
geom_text(aes(label = round(value, 4)), color = "white", size = 3.5) +
scale_fill_gradientn(colors = c("#ffffcc", "#41b6c4", "#253494"),
name = "Score", limits = c(0, 1), oob = squish) +
labs(title = "SVM Models Performance Heatmap (Sorted by F1)",
x = NULL, y = "Model") +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# --- ROC Plot with All Models ---
plot(roc_obj_linear, col = "blue", lwd = 1.5, main = "ROC Curves for SVM Models")
plot(roc_obj_linear_lowC, col = "darkgreen", lwd = 1.5, add = TRUE)
plot(roc_obj_linear_highC, col = "orange", lwd = 1.5, add = TRUE)
plot(roc_obj_radial, col = "purple", lwd = 1.5, add = TRUE)
plot(roc_obj_radial_gamma, col = "red", lwd = 1.5, add = TRUE)
plot(roc_obj_radial_soft, col = "cyan", lwd = 1.5, add = TRUE)
plot(roc_obj_sigmoid, col = "brown", lwd = 1.5, add = TRUE)
plot(roc_obj_poly2_best, col = "darkblue", lwd = 1.5, add = TRUE)
plot(roc_obj_poly3, col = "darkred", lwd = 1.5, add = TRUE)
plot(roc_obj_rbf_grid, col = "black", lwd = 1.5, add = TRUE)
plot(roc_obj_sigmoid_best, col = "magenta", lwd = 1.5, add = TRUE)
legend("topright", inset = c(0.2, 0.25), xpd = TRUE,
legend = c(
"Linear (C=1)", "Linear (C=0.01)", "Linear (C=10)",
"Radial Default", "Radial Gamma=0.1", "Radial C=0.01,Gamma=0.1",
"Sigmoid Manual CV", "Poly Deg=2 (Grid)", "Poly Deg=3", "RBF Grid", "Sigmoid Grid"
),
col = c("blue", "darkgreen", "orange", "purple", "red", "cyan",
"brown", "darkblue", "darkred", "black", "magenta"),
lwd = 1.5,
cex = 0.9,
box.lty = 0,
bg = "white")
C
and
scale
.Yes — The linear kernels consistently performed
well, especially with C = 0.01
, suggesting the dataset is
approximately linearly separable after preprocessing
steps like:
These steps flattened non-linear boundaries into a form that linear models could separate effectively.
If the goal is interpretability, deployability, and overall balance, choose SVM with Linear Kernel (C = 0.01).
If the mission is to maximize detection of true responders, go with Sigmoid (Grid Search Best).
Complex kernels should be reserved for deeply non-linear datasets — and this one, after preprocessing, doesn’t require that level of complexity.