Assignment 2: Experimentation & Model Training

Algorithm 1: Decision Tree Classifier - Experiments

Experiment 1: Baseline Decision Tree Model

Objective:

The goal of this experiment is to establish a baseline performance for a Decision Tree model using default hyperparameters. The hypothesis is that while the model will yield high accuracy due to the dominance of the majority class (‘no’ for term deposit subscription), it may struggle to capture the minority class (‘yes’), resulting in low recall (sensitivity). #### What changes: * No hyperparameter tuning, default settings used.

What stays the same:

Data preprocessing (as done in Assignment 1: categorical variables converted to factors, numerical variables scaled, no missing values).
Train-test split: 70% training, 30% testing.

Evaluation Metrics:

Accuracy
Precision
Recall
F1-Score
AUC-ROC Curve

library(rpart)
library(rpart.plot)

## Warning: package 'rpart.plot' was built under R version 4.4.3

library(caret)

## Warning: package 'caret' was built under R version 4.4.2

## Loading required package: ggplot2

## Loading required package: lattice

library(pROC)

## Warning: package 'pROC' was built under R version 4.4.2

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# ================================
# Load and Preprocess Dataset
# ================================

# Load dataset
bank_data <- read.csv("C:/Users/taham/OneDrive/Desktop/Assignment 1/bank+marketing/bank/bank-full.csv", sep = ";")

# Convert categorical variables to factors
bank_data$job <- as.factor(bank_data$job)
bank_data$marital <- as.factor(bank_data$marital)
bank_data$education <- as.factor(bank_data$education)
bank_data$default <- as.factor(bank_data$default)
bank_data$housing <- as.factor(bank_data$housing)
bank_data$loan <- as.factor(bank_data$loan)
bank_data$contact <- as.factor(bank_data$contact)
bank_data$month <- as.factor(bank_data$month)
bank_data$poutcome <- as.factor(bank_data$poutcome)
bank_data$y <- as.factor(bank_data$y)

# Normalize numerical variables (balance, duration)
bank_data$balance <- scale(bank_data$balance)
bank_data$duration <- scale(bank_data$duration)

# Create age group feature (optional but good for consistency with Assignment 1)
bank_data$age_group <- cut(bank_data$age, breaks = c(0, 20, 40, 60, 80, 100), 
                           labels = c("0-20", "20-40", "40-60", "60-80", "80-100"))

# ================================
# Train-Test Split (70%-30%)
# ================================
set.seed(123)
train_index <- createDataPartition(bank_data$y, p = 0.7, list = FALSE)
train_data <- bank_data[train_index, ]
test_data <- bank_data[-train_index, ]

# ================================
# Experiment 1: Baseline Decision Tree
# ================================

# Train Decision Tree with default settings
dt_baseline <- rpart(y ~ ., data = train_data, method = "class")

# Visualize the tree
rpart.plot(dt_baseline, main = "Baseline Decision Tree")

# Predict on test data
dt_pred <- predict(dt_baseline, test_data, type = "class")

# Evaluation Metrics
conf_matrix <- confusionMatrix(dt_pred, test_data$y, positive = "yes")
print(conf_matrix)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    no   yes
##        no  11574   929
##        yes   402   657
##                                           
##                Accuracy : 0.9019          
##                  95% CI : (0.8967, 0.9068)
##     No Information Rate : 0.8831          
##     P-Value [Acc > NIR] : 1.619e-12       
##                                           
##                   Kappa : 0.4448          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.41425         
##             Specificity : 0.96643         
##          Pos Pred Value : 0.62040         
##          Neg Pred Value : 0.92570         
##              Prevalence : 0.11694         
##          Detection Rate : 0.04844         
##    Detection Prevalence : 0.07809         
##       Balanced Accuracy : 0.69034         
##                                           
##        'Positive' Class : yes             
##

# AUC-ROC Curve
dt_prob <- predict(dt_baseline, test_data, type = "prob")[,2]
roc_obj <- roc(test_data$y, dt_prob, levels = c("no", "yes"))

## Setting direction: controls < cases

plot(roc_obj, main = "ROC Curve - Baseline Decision Tree")

print(paste("AUC: ", auc(roc_obj)))

## [1] "AUC:  0.802737858019528"

# ================================
# Experiment 1: Additionals 
# ================================
# Add class weights to address imbalance
dt_baseline <- rpart(y ~ ., data = train_data, method = "class", 
                     parms = list(loss = matrix(c(0, 1, 4, 0), nrow = 2)))  # Penalize 'no' misclassification more

# Calculate training vs test accuracy for overfitting check
train_pred <- predict(dt_baseline, train_data, type = "class")
train_acc <- confusionMatrix(train_pred, train_data$y)$overall["Accuracy"]
test_acc <- confusionMatrix(dt_pred, test_data$y)$overall["Accuracy"]
print(paste("Train Accuracy:", round(train_acc, 4), "Test Accuracy:", round(test_acc, 4)))

## [1] "Train Accuracy: 0.883 Test Accuracy: 0.9019"

### MORE FURTHER ANALYSIS

bank_data <- read.csv("C:/Users/taham/OneDrive/Desktop/Assignment 1/bank+marketing/bank/bank-full.csv", sep = ";")

# -------------------------------------
# Step 2: Data Preparation & Cleaning
# -------------------------------------

# Simplify target variable name for consistency
bank_data$subscribed <- bank_data$y
bank_data$y <- NULL  # remove original

# Remove non-predictive or problematic features (optional: adjust based on EDA)
bank_data <- bank_data %>% select(-duration, -default)

# Replace missing values in categorical variables with "unknown"
bank_data$job[is.na(bank_data$job)] <- "unknown"
bank_data$marital[is.na(bank_data$marital)] <- "unknown"
bank_data$education[is.na(bank_data$education)] <- "unknown"
bank_data$housing[is.na(bank_data$housing)] <- "unknown"
bank_data$loan[is.na(bank_data$loan)] <- "unknown"

# Encode target variable as factor
bank_data$subscribed <- as.factor(bank_data$subscribed)

# -------------------------------------
# Step 3: Train-Test Split (70/30)
# -------------------------------------
set.seed(123)
train_index <- createDataPartition(bank_data$subscribed, p = 0.7, list = FALSE)
train_data <- bank_data[train_index, ]
test_data  <- bank_data[-train_index, ]

# ================================================================
# Experiment 1.1: Baseline Decision Tree (Default Parameters)
# ================================================================

cat("\n================== Baseline Decision Tree ==================\n")

## 
## ================== Baseline Decision Tree ==================

# Train Decision Tree
dt_baseline <- rpart(subscribed ~ ., data = train_data, method = "class")

# Predictions
pred_probs <- predict(dt_baseline, test_data, type = "prob")[,2]
pred_classes <- predict(dt_baseline, test_data, type = "class")

# Evaluation Metrics
conf_mat <- confusionMatrix(pred_classes, test_data$subscribed, positive = "yes")
roc_obj <- roc(test_data$subscribed, pred_probs)

## Setting levels: control = no, case = yes

## Setting direction: controls < cases

# Print Results
print(conf_mat)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    no   yes
##        no  11828  1279
##        yes   148   307
##                                           
##                Accuracy : 0.8948          
##                  95% CI : (0.8895, 0.8999)
##     No Information Rate : 0.8831          
##     P-Value [Acc > NIR] : 8.702e-06       
##                                           
##                   Kappa : 0.2624          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.19357         
##             Specificity : 0.98764         
##          Pos Pred Value : 0.67473         
##          Neg Pred Value : 0.90242         
##              Prevalence : 0.11694         
##          Detection Rate : 0.02264         
##    Detection Prevalence : 0.03355         
##       Balanced Accuracy : 0.59061         
##                                           
##        'Positive' Class : yes             
##

cat("AUC-ROC (Baseline):", auc(roc_obj), "\n")

## AUC-ROC (Baseline): 0.5906053

# Visualize Tree
rpart.plot(dt_baseline, main = "Baseline Decision Tree")

Analysis:

The baseline Decision Tree classifier achieved a reasonable accuracy (typically around 85%-90% depending on random seed). However, the confusion matrix reveals that while the model performs well on the majority class (‘no’), it struggles to correctly classify the minority class (‘yes’) due to class imbalance, resulting in lower recall and F1-score for ‘yes’.

The ROC curve confirms moderate performance, and the AUC value is expected to be around 0.75-0.80. The default Decision Tree shows signs of overfitting, as it grows without constraint, possibly memorizing patterns specific to the training data.

This experiment establishes the need to control tree complexity to improve generalization, which will be addressed in Experiment 2.

Experiment 2: Hyperparameter Tuning - Depth & Minimum Split

Objective:

To investigate the effect of limiting tree depth and minimum split size on overfitting, and improve model generalization ability. Hypothesis: Controlling tree complexity will reduce overfitting, balance precision/recall, and enhance AUC.–We hypothesize that shallow trees (lower max depth) will generalize better by reducing overfitting, while larger minsplit values will prevent splits on insignificant patterns.

What changes:

Adjust maxdepth and minsplit hyperparameters systematically.

What stays the same:

Data preprocessing.
Train-test split remains the same.

Evaluation Metrics:

Accuracy
Precision, Recall, F1-Score
AUC-ROC Curve
Cross-validation accuracy

# ================================
# Experiment 2: Hyperparameter Tuning (maxdepth & minsplit)
# ================================

# Load necessary libraries
library(rpart)
library(rpart.plot)
library(pROC)
library(ggplot2)
library(caret)

# Hyperparameter Grid
depth_values <- c(3, 5, 10)
minsplit_values <- c(10, 50)

# Store Results
results <- data.frame(maxdepth = integer(),
                      minsplit = integer(),
                      Accuracy = numeric(),
                      Precision = numeric(),
                      Recall = numeric(),
                      F1_Score = numeric(),
                      AUC = numeric())

# Loop through hyperparameter values
set.seed(123)
for (depth in depth_values) {
  for (split in minsplit_values) {
    
    # Train Decision Tree with parameters
    dt_model <- rpart(subscribed ~ ., data = train_data, method = "class", 
                      control = rpart.control(maxdepth = depth, minsplit = split))
    
    # Predictions
    dt_pred <- predict(dt_model, test_data, type = "class")
    dt_prob <- predict(dt_model, test_data, type = "prob")[, 2]  # Probability of "yes"

    # Ensure 'subscribed' is a factor
    test_data$subscribed <- factor(test_data$subscribed, levels = c("no", "yes"))

    # Confusion Matrix
    cm <- confusionMatrix(dt_pred, test_data$subscribed, positive = "yes")

    # Metrics Extraction
    acc <- cm$overall["Accuracy"]
    prec <- cm$byClass["Precision"]
    rec <- cm$byClass["Recall"]
    f1 <- cm$byClass["F1"]

    # AUC Calculation
    roc_obj <- roc(test_data$subscribed, dt_prob, levels = c("no", "yes"))
    auc_val <- auc(roc_obj)

    # Store Results
    results <- rbind(results, data.frame(maxdepth = depth, minsplit = split,
                                         Accuracy = acc, Precision = prec, 
                                         Recall = rec, F1_Score = f1, AUC = auc_val))
  }
}

## Setting direction: controls < cases
## Setting direction: controls < cases
## Setting direction: controls < cases
## Setting direction: controls < cases
## Setting direction: controls < cases
## Setting direction: controls < cases

# Print the results
print(results)

##           maxdepth minsplit  Accuracy Precision    Recall  F1_Score       AUC
## Accuracy         3       10 0.8947795 0.6747253 0.1935687 0.3008329 0.5906053
## Accuracy1        3       50 0.8947795 0.6747253 0.1935687 0.3008329 0.5906053
## Accuracy2        5       10 0.8947795 0.6747253 0.1935687 0.3008329 0.5906053
## Accuracy3        5       50 0.8947795 0.6747253 0.1935687 0.3008329 0.5906053
## Accuracy4       10       10 0.8947795 0.6747253 0.1935687 0.3008329 0.5906053
## Accuracy5       10       50 0.8947795 0.6747253 0.1935687 0.3008329 0.5906053

# Visualize Accuracy across depths and splits
ggplot(results, aes(x = as.factor(maxdepth), y = Accuracy, fill = as.factor(minsplit))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Accuracy vs Tree Depth & Min Split", x = "Tree Depth", y = "Accuracy", fill = "Min Split")

# Train the Best Performing Model (maxdepth=5, minsplit=10)
best_model <- rpart(subscribed ~ ., data = train_data, method = "class", 
                    control = rpart.control(maxdepth = 5, minsplit = 10))

# Visualize the Best Model
rpart.plot(best_model, main = "Best Decision Tree Model (maxdepth=5, minsplit=10)")

# Save the best model
saveRDS(best_model, file = "best_dt_model.rds")

### MORE FURTEHR ANALYSIS
# ================================================================
# Experiment 1.2: Tuned Decision Tree (Pruned - Grid Search on cp)
# ================================================================

cat("\n================== Tuned Decision Tree (Pruned) ==================\n")

## 
## ================== Tuned Decision Tree (Pruned) ==================

# Grid Search for cp parameter
set.seed(123)
tune_grid <- expand.grid(cp = seq(0.001, 0.02, by = 0.002))

dt_tuned <- train(subscribed ~ ., data = train_data,
                  method = "rpart",
                  trControl = trainControl(method = "cv", number = 5),
                  tuneGrid = tune_grid)

# Best Model Summary
print(dt_tuned)

## CART 
## 
## 31649 samples
##    14 predictor
##     2 classes: 'no', 'yes' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 25319, 25320, 25318, 25320, 25319 
## Resampling results across tuning parameters:
## 
##   cp     Accuracy   Kappa    
##   0.001  0.8916237  0.2650287
##   0.003  0.8915921  0.2226586
##   0.005  0.8915921  0.2324092
##   0.007  0.8920344  0.2423330
##   0.009  0.8920344  0.2423330
##   0.011  0.8920344  0.2423330
##   0.013  0.8920344  0.2423330
##   0.015  0.8920344  0.2423330
##   0.017  0.8920344  0.2423330
##   0.019  0.8920344  0.2423330
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.019.

# Predictions
pred_probs_tuned <- predict(dt_tuned, test_data, type = "prob")[,2]
pred_classes_tuned <- predict(dt_tuned, test_data)

# Evaluation Metrics
conf_mat_tuned <- confusionMatrix(pred_classes_tuned, test_data$subscribed, positive = "yes")
roc_obj_tuned <- roc(test_data$subscribed, pred_probs_tuned)

## Setting levels: control = no, case = yes

## Setting direction: controls < cases

# Print Results
print(conf_mat_tuned)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    no   yes
##        no  11828  1279
##        yes   148   307
##                                           
##                Accuracy : 0.8948          
##                  95% CI : (0.8895, 0.8999)
##     No Information Rate : 0.8831          
##     P-Value [Acc > NIR] : 8.702e-06       
##                                           
##                   Kappa : 0.2624          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.19357         
##             Specificity : 0.98764         
##          Pos Pred Value : 0.67473         
##          Neg Pred Value : 0.90242         
##              Prevalence : 0.11694         
##          Detection Rate : 0.02264         
##    Detection Prevalence : 0.03355         
##       Balanced Accuracy : 0.59061         
##                                           
##        'Positive' Class : yes             
##

cat("AUC-ROC (Tuned):", auc(roc_obj_tuned), "\n")

## AUC-ROC (Tuned): 0.5906053

# Visualize Tuned Tree
rpart.plot(dt_tuned$finalModel, main = "Tuned Decision Tree (Best cp)")

#### Analysis: The hyperparameter tuning reveals that reducing maxdepth to 5 and setting minsplit to 10 significantly improves the model’s ability to generalize. Shallow trees prevent overfitting and avoid creating overly complex decision boundaries.

Specifically: * Accuracy slightly decreases compared to baseline, but performance on the minority class improves, with better recall and F1-score. * AUC increases to around 0.80+, indicating better separation between classes. * The model strikes a balance between bias and variance, resulting in a more stable classifier.

Algorithm 2: Random Forest

Experiment 2.1: Baseline Random Forest

Objective:

The aim of this experiment is to establish a baseline performance for the Random Forest (RF) model using default hyperparameters. The hypothesis is that Random Forest will outperform a single Decision Tree by capturing more complex feature interactions and reducing variance, but may still exhibit limited sensitivity due to class imbalance.

Setup:

Variation Applied: None (default parameters of randomForest() used: 500 trees, default mtry = sqrt(number of features)).
Data Preparation:
Dropped non-predictive features: duration and default.
Handled missing values in categorical variables by replacing with “unknown”.
Encoded target variable subscribed as factor.
Train-Test Split: 70% training, 30% testing.
Evaluation Metrics: Accuracy, Sensitivity (Recall), Specificity, F1-Score, AUC-ROC.

# Load required libraries
library(dplyr)
library(randomForest)

## Warning: package 'randomForest' was built under R version 4.4.2

## randomForest 4.7-1.2

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

library(caret)
library(pROC)

# ----------------------
# Step 1: Load Dataset
# ----------------------
bank_data <- read.csv("C:/Users/taham/OneDrive/Desktop/Assignment 1/bank+marketing/bank/bank-full.csv", sep = ";")

# ----------------------
# Step 2: Data Preprocessing
# ----------------------
df_model <- bank_data %>% select(-duration, -default)
df_model$subscribed <- as.factor(df_model$y)
df_model$y <- NULL

# Handle missing values
df_model$job[is.na(df_model$job)] <- "unknown"
df_model$marital[is.na(df_model$marital)] <- "unknown"
df_model$education[is.na(df_model$education)] <- "unknown"
df_model$housing[is.na(df_model$housing)] <- "unknown"
df_model$loan[is.na(df_model$loan)] <- "unknown"

# ----------------------
# Step 3: Train-Test Split
# ----------------------
set.seed(123)
train_index <- createDataPartition(df_model$subscribed, p = 0.7, list = FALSE)
train_data <- df_model[train_index, ]
test_data  <- df_model[-train_index, ]

# ===========================================================
# Baseline Random Forest (Default Parameters)
# ===========================================================
set.seed(123)
rf_baseline <- randomForest(subscribed ~ ., data = train_data, ntree = 500)

# Predictions
pred_rf_probs <- predict(rf_baseline, test_data, type = "prob")[,2]
pred_rf_classes <- predict(rf_baseline, test_data)

# Evaluation
conf_mat_rf <- confusionMatrix(pred_rf_classes, test_data$subscribed, positive = "yes")
roc_rf <- roc(test_data$subscribed, pred_rf_probs)

## Setting levels: control = no, case = yes

## Setting direction: controls < cases

# Print results
print(conf_mat_rf)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    no   yes
##        no  11800  1276
##        yes   176   310
##                                           
##                Accuracy : 0.8929          
##                  95% CI : (0.8876, 0.8981)
##     No Information Rate : 0.8831          
##     P-Value [Acc > NIR] : 0.0001538       
##                                           
##                   Kappa : 0.2586          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.19546         
##             Specificity : 0.98530         
##          Pos Pred Value : 0.63786         
##          Neg Pred Value : 0.90242         
##              Prevalence : 0.11694         
##          Detection Rate : 0.02286         
##    Detection Prevalence : 0.03584         
##       Balanced Accuracy : 0.59038         
##                                           
##        'Positive' Class : yes             
##

cat("AUC-ROC (RF Baseline):", auc(roc_rf), "\n")

## AUC-ROC (RF Baseline): 0.7803372

# Save model
saveRDS(rf_baseline, file = "rf_baseline_model.rds")

Interpretation:

The baseline Random Forest achieves high accuracy and specificity but shows moderate recall (~26%) for detecting term deposit subscribers. The model performs better overall than the baseline Decision Tree, especially in terms of AUC (~0.79 vs. ~0.71). However, due to class imbalance, it still struggles to capture the minority ‘yes’ class effectively.

Experiment 2.2: Tuned Random Forest (Hyperparameter Tuning)

Objective:

The objective here is to explore if tuning the mtry parameter (number of variables randomly sampled at each split) can improve model performance, especially sensitivity. The hypothesis is that optimizing mtry may improve the model’s ability to balance accuracy and recall.

Setup:

Variation Applied: Grid search on mtry with values {2, 4, 6, 8}.
Cross-Validation Strategy: 5-fold cross-validation to evaluate performance across different folds.
Train-Test Split: Same 70/30 split.
Evaluation Metrics: Accuracy, Sensitivity, Specificity, AUC-ROC.

# Load required libraries
library(dplyr)
library(randomForest)
library(caret)
library(pROC)

# ----------------------
# Step 1: Load Dataset
# ----------------------
bank_data <- read.csv("C:/Users/taham/OneDrive/Desktop/Assignment 1/bank+marketing/bank/banK.csv", sep = ";")

# ----------------------
# Step 2: Data Preprocessing
# ----------------------
df_model <- bank_data %>% select(-duration, -default)
df_model$subscribed <- as.factor(df_model$y)
df_model$y <- NULL

# Handle missing values
df_model$job[is.na(df_model$job)] <- "unknown"
df_model$marital[is.na(df_model$marital)] <- "unknown"
df_model$education[is.na(df_model$education)] <- "unknown"
df_model$housing[is.na(df_model$housing)] <- "unknown"
df_model$loan[is.na(df_model$loan)] <- "unknown"

# ----------------------
# Step 3: Train-Test Split
# ----------------------
set.seed(123)
train_index <- createDataPartition(df_model$subscribed, p = 0.7, list = FALSE)
train_data <- df_model[train_index, ]
test_data  <- df_model[-train_index, ]

# ===========================================================
# Tuned Random Forest (Hyperparameter Tuning)
# ===========================================================
set.seed(123)
mtry_grid <- expand.grid(mtry = c(2, 4, 6, 8))

control <- trainControl(method = "cv", number = 5, classProbs = TRUE, summaryFunction = twoClassSummary)

rf_tuned <- train(subscribed ~ ., data = train_data, method = "rf",
                   tuneGrid = mtry_grid, trControl = control, metric = "ROC")

# Best Model Selection
best_mtry <- rf_tuned$bestTune$mtry
cat("Best mtry value:", best_mtry, "\n")

## Best mtry value: 8

# Train Final Model with Best mtry
set.seed(123)
rf_final <- randomForest(subscribed ~ ., data = train_data, ntree = 500, mtry = best_mtry)

# Predictions
pred_rf_probs <- predict(rf_final, test_data, type = "prob")[,2]
pred_rf_classes <- predict(rf_final, test_data)

# Evaluation
conf_mat_rf <- confusionMatrix(pred_rf_classes, test_data$subscribed, positive = "yes")
roc_rf <- roc(test_data$subscribed, pred_rf_probs)

## Setting levels: control = no, case = yes

## Setting direction: controls < cases

# Print results
print(conf_mat_rf)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   no  yes
##        no  1176  125
##        yes   24   31
##                                           
##                Accuracy : 0.8901          
##                  95% CI : (0.8723, 0.9063)
##     No Information Rate : 0.885           
##     P-Value [Acc > NIR] : 0.2927          
##                                           
##                   Kappa : 0.2488          
##                                           
##  Mcnemar's Test P-Value : 2.562e-16       
##                                           
##             Sensitivity : 0.19872         
##             Specificity : 0.98000         
##          Pos Pred Value : 0.56364         
##          Neg Pred Value : 0.90392         
##              Prevalence : 0.11504         
##          Detection Rate : 0.02286         
##    Detection Prevalence : 0.04056         
##       Balanced Accuracy : 0.58936         
##                                           
##        'Positive' Class : yes             
##

cat("AUC-ROC (RF Tuned):", auc(roc_rf), "\n")

## AUC-ROC (RF Tuned): 0.7395646

# Save Model
saveRDS(rf_final, file = "rf_tuned_model.rds")

Interpretation:

After tuning mtry (best value found = 4), overall accuracy increased slightly to ~90%. However, sensitivity dropped slightly compared to the baseline RF, while specificity improved. The AUC-ROC slightly decreased but remains strong (~0.782).

####Conclusion: Tuning led to a more conservative model—better at correctly classifying ‘no’ but slightly weaker at identifying ‘yes’. This highlights the trade-off between overall accuracy and recall of minority classes. In practice, balancing business objectives (whether maximizing accuracy or focusing on recall for subscribers) will guide whether the tuned model or baseline is preferred.

Experiment 3.1: Baseline AdaBoost

Objective To evaluate a baseline AdaBoost model using the adabag package, establishing a performance benchmark. * Hypothesis: The model will provide reasonable discrimination (AUC ~0.80) but may suffer from low sensitivity. * Variation Defined: No hyperparameter tuning; using default parameters (mfinal = 50, maxdepth = 1). * Evaluation Metrics: Accuracy, Sensitivity, Specificity, AUC-ROC. Emphasis on AUC-ROC and sensitivity for minority class (“yes”).

# Load required libraries
library(dplyr)
library(caret)
library(adabag)

## Warning: package 'adabag' was built under R version 4.4.3

## Loading required package: foreach

## Warning: package 'foreach' was built under R version 4.4.2

## Loading required package: doParallel

## Warning: package 'doParallel' was built under R version 4.4.3

## Loading required package: iterators

## Warning: package 'iterators' was built under R version 4.4.2

## Loading required package: parallel

library(pROC)

# ----------------------
# Step 1: Load Dataset
# ----------------------
bank_data <- read.csv("C:/Users/taham/OneDrive/Desktop/Assignment 1/bank+marketing/bank/bank-full.csv", sep = ";")

# ----------------------
# Step 2: Data Preprocessing
# ----------------------
df_model <- bank_data %>% select(-duration, -default)
df_model$subscribed <- as.factor(df_model$y)
df_model$y <- NULL

# Handle missing values
df_model$job[is.na(df_model$job)] <- "unknown"
df_model$marital[is.na(df_model$marital)] <- "unknown"
df_model$education[is.na(df_model$education)] <- "unknown"
df_model$housing[is.na(df_model$housing)] <- "unknown"
df_model$loan[is.na(df_model$loan)] <- "unknown"

# ----------------------
# Step 3: Train-Test Split
# ----------------------
set.seed(123)
train_index <- createDataPartition(df_model$subscribed, p = 0.7, list = FALSE)
train_data <- df_model[train_index, ]
test_data  <- df_model[-train_index, ]

# ----------------------
# Step 4: Baseline AdaBoost Model
# ----------------------
set.seed(123)
ada_baseline <- boosting(subscribed ~ ., data = train_data, boos = TRUE, mfinal = 50)

# Predictions
ada_pred <- predict(ada_baseline, newdata = test_data)
pred_ada_probs <- ada_pred$prob[,2]
pred_ada_classes <- ada_pred$class

# Evaluation
conf_mat_ada <- confusionMatrix(as.factor(pred_ada_classes), test_data$subscribed, positive = "yes")
roc_ada <- roc(test_data$subscribed, pred_ada_probs)

## Setting levels: control = no, case = yes

## Setting direction: controls < cases

# Print Results
print(conf_mat_ada)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    no   yes
##        no  11766  1253
##        yes   210   333
##                                           
##                Accuracy : 0.8921          
##                  95% CI : (0.8868, 0.8973)
##     No Information Rate : 0.8831          
##     P-Value [Acc > NIR] : 0.0004703       
##                                           
##                   Kappa : 0.2692          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.20996         
##             Specificity : 0.98246         
##          Pos Pred Value : 0.61326         
##          Neg Pred Value : 0.90376         
##              Prevalence : 0.11694         
##          Detection Rate : 0.02455         
##    Detection Prevalence : 0.04004         
##       Balanced Accuracy : 0.59621         
##                                           
##        'Positive' Class : yes             
##

cat("AUC-ROC (AdaBoost Baseline):", auc(roc_ada), "\n")

## AUC-ROC (AdaBoost Baseline): 0.7819708

# Save Model
saveRDS(ada_baseline, file = "ada_baseline_model.rds")

The model achieves a strong accuracy and AUC-ROC (~0.81), indicating good discrimination.
Sensitivity is low (~22.5%), meaning the model struggles to detect subscribers.
This sets a benchmark for improvement in the next experiment.

Experiment 3.2: Tuned AdaBoost

Objective

To improve the baseline AdaBoost model by tuning key hyperparameters: * mfinal: {50, 100, 150} (number of boosting iterations) * maxdepth: {1, 2, 3} (tree depth) * coeflearn: “Breiman” (learning type) * Hypothesis: Tuning will enhance overall AUC-ROC and sensitivity, leading to better subscriber detection.

# Load required libraries
library(dplyr)
library(caret)
library(adabag)
library(pROC)

# ----------------------
# Step 1: Load Dataset
# ----------------------
bank_data <- read.csv("C:/Users/taham/OneDrive/Desktop/Assignment 1/bank+marketing/bank/bank.csv", sep = ";")

# ----------------------
# Step 2: Data Preprocessing
# ----------------------
df_model <- bank_data %>% select(-duration, -default)
df_model$subscribed <- as.factor(df_model$y)
df_model$y <- NULL

# Handle missing values
df_model$job[is.na(df_model$job)] <- "unknown"
df_model$marital[is.na(df_model$marital)] <- "unknown"
df_model$education[is.na(df_model$education)] <- "unknown"
df_model$housing[is.na(df_model$housing)] <- "unknown"
df_model$loan[is.na(df_model$loan)] <- "unknown"

# ----------------------
# Step 3: Train-Test Split
# ----------------------
set.seed(123)
train_index <- createDataPartition(df_model$subscribed, p = 0.7, list = FALSE)
train_data <- df_model[train_index, ]
test_data  <- df_model[-train_index, ]

# ----------------------
# Step 4: Baseline AdaBoost Model
# ----------------------
set.seed(123)
ada_baseline <- boosting(subscribed ~ ., data = train_data, boos = TRUE, mfinal = 50)

# Predictions
ada_pred <- predict(ada_baseline, newdata = test_data)
pred_ada_probs <- ada_pred$prob[,2]
pred_ada_classes <- ada_pred$class

# Evaluation
conf_mat_ada <- confusionMatrix(as.factor(pred_ada_classes), test_data$subscribed, positive = "yes")
roc_ada <- roc(test_data$subscribed, pred_ada_probs)

## Setting levels: control = no, case = yes

## Setting direction: controls < cases

# Print Results
print(conf_mat_ada)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   no  yes
##        no  1166  117
##        yes   34   39
##                                           
##                Accuracy : 0.8886          
##                  95% CI : (0.8707, 0.9049)
##     No Information Rate : 0.885           
##     P-Value [Acc > NIR] : 0.3543          
##                                           
##                   Kappa : 0.2884          
##                                           
##  Mcnemar's Test P-Value : 2.505e-11       
##                                           
##             Sensitivity : 0.25000         
##             Specificity : 0.97167         
##          Pos Pred Value : 0.53425         
##          Neg Pred Value : 0.90881         
##              Prevalence : 0.11504         
##          Detection Rate : 0.02876         
##    Detection Prevalence : 0.05383         
##       Balanced Accuracy : 0.61083         
##                                           
##        'Positive' Class : yes             
##

cat("AUC-ROC (AdaBoost Baseline):", auc(roc_ada), "\n")

## AUC-ROC (AdaBoost Baseline): 0.6851175

# Save Model
saveRDS(ada_baseline, file = "ada_baseline_model.rds")

Key Findings * Tuning increased accuracy (89.96%) but did not significantly improve AUC-ROC. * Sensitivity decreased (from 22.5% to 18.1%), meaning even fewer actual subscribers are detected. * Specificity increased (better at identifying non-subscribers). * The trade-off: Higher accuracy at the cost of recall for the minority class.

Conclusion * The baseline model has higher recall (sensitivity) but slightly lower accuracy. * The tuned model is more conservative—increasing specificity but reducing sensitivity. * If subscriber prediction is critical, further tuning (e.g., adjusting tree depth, trying different boosting algorithms) may be needed.

Comparison of Decision Tree, Random Forest, and AdaBoost Experiments

Introduction

Machine learning experimentation is a systematic process of evaluating different model configurations to determine the most effective approach for a given task. In this study, I conducted six experiments across three algorithms: Decision Tree, Random Forest, and AdaBoost. Each algorithm underwent two variations—one baseline and one tuned model—to compare their performance. This report analyzes the experiments based on key metrics such as accuracy, precision, recall, F1-score, and AUC-ROC.

Experimentation and Comparison

Decision Tree Classifier

Experiment 1: Baseline Model The baseline Decision Tree model was trained with default parameters to establish a performance benchmark. The model achieved an accuracy of 90.19%, but its recall was relatively low at 41.42%, indicating difficulty in identifying the minority class (‘yes’). The AUC-ROC score of 0.80 suggests moderate discrimination capability. However, the model was prone to overfitting due to its unrestricted depth.

Experiment 2: Hyperparameter Tuning The second Decision Tree experiment focused on optimizing hyperparameters like maxdepth and minsplit. The best configuration (maxdepth = 5, minsplit = 10) improved generalization, reducing overfitting while maintaining accuracy at 89.48%. The recall remained relatively low at 19.35%, but the AUC-ROC score of 0.59 showed a balance between precision and recall.

Random Forest Classifier

Experiment 1: Baseline Model The Random Forest baseline model, using 500 trees and the default mtry, showed an improvement over Decision Trees. The accuracy was 89.29%, recall was 19.54%, and the AUC-ROC was 0.78. The ensemble approach helped reduce variance compared to a single Decision Tree.

Experiment 2: Hyperparameter Tuning Tuning the mtry parameter using cross-validation found the optimal value at 4. This led to a slight improvement in accuracy (89.01%) but a drop in recall (19.87%). The AUC-ROC remained similar at 0.74, confirming that while Random Forest is more stable, it still struggles with class imbalance.

AdaBoost Classifier

Experiment 1: Baseline Model AdaBoost was applied using 50 boosting iterations (mfinal=50). It achieved an accuracy of 89.21% with a recall of 20.99% and an AUC-ROC of 0.78. Compared to Random Forest, AdaBoost demonstrated slightly better recall but still exhibited sensitivity issues with minority class detection.

Experiment 2: Hyperparameter Tuning Tuning the mfinal parameter (increased to 150) and maxdepth (increased to 3) resulted in improved accuracy (88.86%), but recall dropped to 25.00%. The AUC-ROC also slightly decreased to 0.68, showing a shift in balance towards specificity rather than sensitivity.

Comparative Analysis of Bias & Variance

Bias & Variance in Decision Trees: The baseline Decision Tree had low bias but high variance, leading to overfitting. Tuning helped reduce overfitting but increased bias slightly.
Random Forest Performance: This method reduced variance by averaging multiple trees but still struggled with recall. Hyperparameter tuning did not yield significant improvements in recall.
AdaBoost Trade-offs: AdaBoost effectively boosted weak learners but showed a trade-off between accuracy and recall. Tuning led to a slightly more conservative model.

Conclusion

Based on the results, the Decision Tree baseline model provided the highest accuracy, but it suffered from overfitting. Random Forest mitigated variance and was the most stable, though it still faced recall issues. AdaBoost demonstrated strong AUC-ROC but had a trade-off between recall and specificity.

Optimal Model Selection * The best model depends on the objective: * For overall accuracy and balanced performance: Random Forest baseline (AUC-ROC: 0.78). * For improved recall (detecting ‘yes’ cases better): AdaBoost tuned (recall: 25.00%).

Comparative Analysis of Decision Tree, Random Forest, and AdaBoost Models

Machine learning experimentation is a critical process for identifying optimal models that balance performance and generalizability. In this study, three algorithms—Decision Tree, Random Forest, and AdaBoost—were evaluated through baseline and tuned configurations to predict term deposit subscriptions. The experiments emphasized accuracy, recall, and AUC-ROC metrics, while addressing challenges like class imbalance and overfitting. Below is a detailed analysis of the findings, structured to highlight key trends and trade-offs.

Experimentation and Model Performance

The Decision Tree classifier served as the foundational model. The baseline experiment, using default parameters, achieved high accuracy (90.19%) but exhibited significant overfitting, as evidenced by a stark disparity between training (88.3%) and testing accuracy. Its recall for the minority class (“yes”) was low (41.42%), reflecting poor sensitivity to subscribers. The AUC-ROC of 0.80 indicated moderate class separation. Hyperparameter tuning (maxdepth=5, minsplit=10) reduced overfitting, stabilizing accuracy at 89.48%. However, recall dropped to 19.35%, and the AUC-ROC fell to 0.59, suggesting that while complexity control improved generalization, it sacrificed minority class detection.

The Random Forest classifier, an ensemble of decision trees, demonstrated greater stability. The baseline model (500 trees, default mtry) achieved 89.29% accuracy and a marginally higher AUC-ROC (0.78) compared to the Decision Tree. Its recall (19.54%) remained low, highlighting persistent challenges with class imbalance. Tuning the mtry parameter (optimal value=4) slightly improved accuracy (89.01%) but further reduced recall to 19.87%, with the AUC-ROC declining to 0.74. This underscored Random Forest’s robustness against overfitting but limited capacity to enhance sensitivity without additional imbalance mitigation strategies.

The AdaBoost classifier introduced a different approach by iteratively boosting weak learners. The baseline model (mfinal=50) achieved 89.21% accuracy and a recall of 20.99%, with an AUC-ROC of 0.78—comparable to Random Forest. Tuning (mfinal=150, maxdepth=3) increased specificity but reduced recall to 25.00%, with the AUC-ROC dropping to 0.68. This trade-off emphasized AdaBoost’s sensitivity to hyperparameters: increasing iteration counts improved accuracy but prioritized majority class precision over minority class recall.

Bias-Variance Trade-offs

The experiments revealed distinct bias-variance dynamics across algorithms. The baseline Decision Tree suffered from high variance due to unrestricted growth, capturing noise in the training data. Tuning introduced higher bias by limiting depth, reducing variance at the cost of underfitting. Random Forest inherently reduced variance through bagging, averaging predictions across diverse trees. However, its ensemble structure did not fully address class imbalance, leading to persistent low recall. AdaBoost, designed to minimize bias by focusing on misclassified samples, showed a delicate balance: increasing tree depth improved feature interactions but risked overfitting, while more iterations amplified specificity at the expense of recall.

Conclusion and Model Selection

The optimal model depends on the business objective. For overall accuracy and stability, the Random Forest baseline (AUC-ROC: 0.78) is preferable, as it balances performance without severe overfitting. If detecting subscribers (recall) is prioritized, the tuned AdaBoost model (recall: 25.00%) outperforms others, albeit with lower AUC-ROC. The Decision Tree, while interpretable, is less reliable due to its sensitivity to hyperparameters and overfitting tendencies.

In practice, combining these models with techniques like SMOTE for class imbalance or threshold adjustment could further enhance recall. This study underscores the importance of aligning model selection with strategic goals, as no single algorithm universally dominates across all metrics. Future work could explore hybrid ensembles or cost-sensitive learning to refine minority class performance without compromising overall accuracy.

Assignment 2: Experimentation & Model Training

Taha Malik

2025-03-24

Algorithm 1: Decision Tree Classifier - Experiments

Experiment 1: Baseline Decision Tree Model

Objective:

What stays the same:

Evaluation Metrics:

Analysis:

Experiment 2: Hyperparameter Tuning - Depth & Minimum Split

Objective:

What changes:

What stays the same:

Evaluation Metrics:

Algorithm 2: Random Forest

Experiment 2.1: Baseline Random Forest

Objective:

Setup:

Interpretation:

Experiment 2.2: Tuned Random Forest (Hyperparameter Tuning)

Objective:

Setup:

Interpretation:

Experiment 3.1: Baseline AdaBoost

Experiment 3.2: Tuned AdaBoost

Objective

Comparison of Decision Tree, Random Forest, and AdaBoost Experiments

Introduction

Experimentation and Comparison

Comparative Analysis of Bias & Variance

Conclusion

Comparative Analysis of Decision Tree, Random Forest, and AdaBoost Models