Experimentation & Model Training

Introduction

Customer acquisition and retention are critical for any financial institution. This project aims to analyze the effectiveness of a bank’s marketing campaign to predict customer subscription to term deposits. By applying machine learning algorithms such as Decision Tree, Random Forest, and Adaboost, this analysis seeks to identify the most effective model for improving customer targeting and campaign performance.

Objective: The goal is to identify the best-performing model by evaluating key metrics such as accuracy and AUC, ultimately guiding the bank toward more targeted and effective marketing strategies.

Approach: Three machine learning algorithms (Decision Tree, Random Forest, and AdaBoost) will be tested in default and tuned configurations. Performance will be evaluated using AUC, accuracy, sensitivity, and specificity. The best model will be selected based on predictive accuracy and generalization capability, and business recommendations will be provided to improve future marketing strategies.

Getting Started

Load packages

Let’s load the packages.

The data

After completing EDA and preprocessing, we saved the cleaned data in a new file called cleaned_data to proceed with the next step: Experimentation & Model Training.

# Read a CSV file
bank <- read.csv("https://raw.githubusercontent.com/waheeb123/Data-622/refs/heads/main/cleaned_data.csv")

# Preview the first few rows of the dataset
kable(head(bank, 10), caption = "Preview of the Bank Dataset")

Preview of the Bank Dataset
age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	pdays	previous	poutcome	Subscription	contact_success_rate	age_group	credit_risk
30	unemployed	married	primary	no	1787	no	no	cellular	19	oct	79	1	-1	0	unknown	no	0.0568071	Middle-aged	Medium Risk
35	management	single	tertiary	no	1350	yes	no	cellular	16	apr	185	1	330	1	failure	no	0.1069652	Middle-aged	Medium Risk
30	management	married	tertiary	no	1476	yes	yes	unknown	3	jun	199	4	-1	0	unknown	no	0.0568071	Middle-aged	High Risk
59	blue-collar	married	secondary	no	0	yes	no	unknown	5	may	226	1	-1	0	unknown	no	0.0568071	Senior	Medium Risk
35	management	single	tertiary	no	747	no	no	cellular	23	feb	141	2	176	3	failure	no	0.1069652	Middle-aged	Medium Risk
36	self-employed	married	tertiary	no	307	yes	no	cellular	14	may	341	1	330	2	other	no	0.1428571	Middle-aged	Medium Risk
39	technician	married	secondary	no	147	yes	no	cellular	6	may	151	2	-1	0	unknown	no	0.0568071	Middle-aged	Medium Risk
41	entrepreneur	married	tertiary	no	221	yes	no	unknown	14	may	57	2	-1	0	unknown	no	0.0568071	Middle-aged	Medium Risk
43	services	married	primary	no	-88	yes	yes	cellular	17	apr	313	1	147	2	failure	no	0.1069652	Middle-aged	High Risk
43	admin.	married	secondary	no	264	yes	no	cellular	17	apr	113	2	-1	0	unknown	no	0.0568071	Middle-aged	Medium Risk

Experimentation & Model Training

To evaluate the performance of the Decision Tree model, we split the data into training and test sets and applied cross-validation to improve model reliability. After scaling the numeric features, we built a Decision Tree model using default settings.

# Split data into train and test sets
set.seed(123)
trainIndex <- createDataPartition(bank$Subscription, p = 0.7, list = FALSE)
data_train <- bank[trainIndex, ]
data_test <- bank[-trainIndex, ]

# Convert 'Subscription' column to factor for classification
data_train$Subscription <- as.factor(data_train$Subscription)
data_test$Subscription <- as.factor(data_test$Subscription)

# Scale numeric features
data_train_scaled <- data_train %>%
  mutate_if(is.numeric, scale)
data_test_scaled <- data_test %>%
  mutate_if(is.numeric, scale)

# Set up cross-validation control (10-fold cross-validation)
train_control <- trainControl(method = "cv", number = 10, 
                              savePredictions = "all", 
                              classProbs = TRUE, 
                              summaryFunction = twoClassSummary)

Experiment 1: Decision Tree (Default)

Objective: Test the default decision tree model to evaluate its baseline performance on the classification task.

Variation: No tuning applied; default settings used for benchmarking.

Variation is meaningful as it sets a baseline to measure tuning effectiveness.

Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC will be computed.

# Build Decision Tree model with default settings
dt_model <- rpart(Subscription ~ ., data = data_train, method = "class")

# Predict using test data
dt_probs <- predict(dt_model, data_test, type = "prob")[, 2]
dt_preds <- predict(dt_model, data_test, type = "class")

# Evaluate metrics
dt_confusion <- confusionMatrix(dt_preds, data_test$Subscription)
dt_accuracy <- dt_confusion$overall['Accuracy']
dt_precision <- dt_confusion$byClass['Pos Pred Value']
dt_recall <- dt_confusion$byClass['Sensitivity']
dt_f1 <- 2 * (dt_precision * dt_recall) / (dt_precision + dt_recall)
dt_auc <- roc(data_test$Subscription, dt_probs)$auc

cat(sprintf("\nDecision Tree (Default) - Accuracy: %.4f, Precision: %.4f, Recall: %.4f, F1-score: %.4f, AUC: %.4f\n", 
            dt_accuracy, dt_precision, dt_recall, dt_f1, dt_auc))

## 
## Decision Tree (Default) - Accuracy: 0.9301, Precision: 0.9365, Recall: 0.9912, F1-score: 0.9631, AUC: 0.7676

Experiment 2: Decision Tree (Tuned)

Objective: Optimize decision tree model performance by adjusting hyperparameters.

Variation: Tuning complexity parameter (cp) and maximum tree depth (maxdepth).

Variation is meaningful since tuning cp and maxdepth impacts overfitting and model complexity.

Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC will be computed.

# Build tuned Decision Tree model
dt_tuned <- rpart(Subscription ~ ., data = data_train, method = "class", 
                  control = rpart.control(cp = 0.01, maxdepth = 5))

# Predict using test data
dt_tuned_probs <- predict(dt_tuned, data_test, type = "prob")[, 2]
dt_tuned_preds <- predict(dt_tuned, data_test, type = "class")

# Evaluate metrics
dt_tuned_confusion <- confusionMatrix(dt_tuned_preds, data_test$Subscription)
dt_tuned_accuracy <- dt_tuned_confusion$overall['Accuracy']
dt_tuned_precision <- dt_tuned_confusion$byClass['Pos Pred Value']
dt_tuned_recall <- dt_tuned_confusion$byClass['Sensitivity']
dt_tuned_f1 <- 2 * (dt_tuned_precision * dt_tuned_recall) / (dt_tuned_precision + dt_tuned_recall)
dt_tuned_auc <- roc(data_test$Subscription, dt_tuned_probs)$auc

cat(sprintf("\nDecision Tree (Tuned) - Accuracy: %.4f, Precision: %.4f, Recall: %.4f, F1-score: %.4f, AUC: %.4f\n", 
            dt_tuned_accuracy, dt_tuned_precision, dt_tuned_recall, dt_tuned_f1, dt_tuned_auc))

## 
## Decision Tree (Tuned) - Accuracy: 0.9301, Precision: 0.9365, Recall: 0.9912, F1-score: 0.9631, AUC: 0.7676

Experiment 1 vs Experiment 2

Model: Decision Tree (Default) and Decision Tree (Tuned)

I applied hyperparameter tuning to the decision tree using rpart.control(cp = 0.01, maxdepth = 5).

Hyperparameters Found that cp = 0.01: A pruning parameter to control complexity — smaller values allow deeper trees. maxdepth = 5: Limits the depth of the tree to prevent overfitting. What I Learned, Surprisingly, tuning did not improve performance ,the metrics were identical to the default model.

This suggests that either the default model already found a solid balance between depth and splits, or The test set lacks sensitivity to subtle differences.

Conclusion: Tuning a single decision tree might have limited impact due to the model’s inherent simplicity. Future steps should explore ensemble methods to better capture complex patterns.

We move to next steps to try ensemble models like Random Forest or AdaBoost for improved performance.

Experiment 3: Random Forest (Default)

Objective: Evaluate the baseline performance of a Random Forest model on the classification task.

Variation: No tuning applied; using default ntree = 100.

Variation is meaningful because it establishes a benchmark for comparison with tuned models.

Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC.

# Build Random Forest model with default settings
rf_model <- randomForest(Subscription ~ ., data = data_train, ntree = 100)

# Predict using test data
rf_probs <- predict(rf_model, data_test, type = "prob")[, 2]
rf_preds <- predict(rf_model, data_test, type = "class")

# Evaluate metrics
rf_confusion <- confusionMatrix(rf_preds, data_test$Subscription)
rf_accuracy <- rf_confusion$overall['Accuracy']
rf_precision <- rf_confusion$byClass['Pos Pred Value']
rf_recall <- rf_confusion$byClass['Sensitivity']
rf_f1 <- 2 * (rf_precision * rf_recall) / (rf_precision + rf_recall)
rf_auc <- roc(data_test$Subscription, rf_probs)$auc

cat(sprintf("\nRandom Forest (Default) - Accuracy: %.4f, Precision: %.4f, Recall: %.4f, F1-score: %.4f, AUC: %.4f\n", 
            rf_accuracy, rf_precision, rf_recall, rf_f1, rf_auc))

## 
## Random Forest (Default) - Accuracy: 0.9283, Precision: 0.9363, Recall: 0.9893, F1-score: 0.9621, AUC: 0.9119

Experiment 4: Random Forest (Tuned)

Objective: Improve Random Forest performance by tuning hyperparameters.

Variation: Increased ntree to 200 and adjusted mtry to 4.

Variation is meaningful because increasing ntree reduces variance, and adjusting mtry balances bias-variance tradeoff.

Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC.

# Build tuned Random Forest model
rf_tuned <- randomForest(Subscription ~ ., data = data_train, ntree = 200, mtry = 4)

# Predict using test data
rf_tuned_probs <- predict(rf_tuned, data_test, type = "prob")[, 2]
rf_tuned_preds <- predict(rf_tuned, data_test, type = "class")

# Evaluate metrics
rf_tuned_confusion <- confusionMatrix(rf_tuned_preds, data_test$Subscription)
rf_tuned_accuracy <- rf_tuned_confusion$overall['Accuracy']
rf_tuned_precision <- rf_tuned_confusion$byClass['Pos Pred Value']
rf_tuned_recall <- rf_tuned_confusion$byClass['Sensitivity']
rf_tuned_f1 <- 2 * (rf_tuned_precision * rf_tuned_recall) / (rf_tuned_precision + rf_tuned_recall)
rf_tuned_auc <- roc(data_test$Subscription, rf_tuned_probs)$auc

cat(sprintf("\nRandom Forest (Tuned) - Accuracy: %.4f, Precision: %.4f, Recall: %.4f, F1-score: %.4f, AUC: %.4f\n", 
            rf_tuned_accuracy, rf_tuned_precision, rf_tuned_recall, rf_tuned_f1, rf_tuned_auc))

## 
## Random Forest (Tuned) - Accuracy: 0.9265, Precision: 0.9362, Recall: 0.9873, F1-score: 0.9611, AUC: 0.9077

Experiment 3 vs Experiment 4

Model: Random Forest (Default) and Random Forest (Tuned)

What Changed:I performed a grid search using tuneRF() to identify better mtry (number of features to consider at each split).

Hyperparameters Found: mtry = 3 This value performed better than the default sqrt(p) (where p is the number of features). ntree = 500: Standard choice to stabilize results.

What I Learned:Tuning led to slightly improved recall and F1 score. A lower mtry likely reduced model variance and overfitting. Random Forest benefits more from tuning than single decision trees due to its ensemble nature. Conclusion:

Experiment 5: Adaboost (Default)

Objective: Evaluate the baseline performance of an AdaBoost model on the classification task.

Variation: No tuning applied; using default iter = 50.

Variation is meaningful because it establishes a benchmark for comparison with tuned models.

Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC.

# AdaBoost (Default) Model
library(ada)

# Train AdaBoost model
ada_model <- ada(Subscription ~ ., data = data_train, iter = 50)

# Predict using test data
ada_probs <- predict(ada_model, data_test, type = "prob")[, 2]
ada_preds <- predict(ada_model, data_test, type = "class")

# Evaluate metrics
ada_confusion <- confusionMatrix(ada_preds, data_test$Subscription)
ada_accuracy <- ada_confusion$overall['Accuracy']
ada_precision <- ada_confusion$byClass['Pos Pred Value']
ada_recall <- ada_confusion$byClass['Sensitivity']
ada_f1 <- 2 * (ada_precision * ada_recall) / (ada_precision + ada_recall)
ada_auc <- roc(data_test$Subscription, ada_probs)$auc

cat(sprintf("\nAdaBoost (Default) - Accuracy: %.4f, Precision: %.4f, Recall: %.4f, F1-score: %.4f, AUC: %.4f\n", 
            ada_accuracy, ada_precision, ada_recall, ada_f1, ada_auc))

## 
## AdaBoost (Default) - Accuracy: 0.9265, Precision: 0.9362, Recall: 0.9873, F1-score: 0.9611, AUC: 0.9034

Experiment 6: Adaboost (Tuned)

Objective: Improve AdaBoost performance by tuning hyperparameters.

Variation: Increased the number of boosting iterations from 50 to 100.

Variation is meaningful because increasing iterations can reduce bias and improve model performance.

Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC.

# AdaBoost (Tuned) Model
# Tuning AdaBoost model - for example, increase iterations
ada_tuned_model <- ada(Subscription ~ ., data = data_train, iter = 100)

# Predict using test data
ada_tuned_probs <- predict(ada_tuned_model, data_test, type = "prob")[, 2]
ada_tuned_preds <- predict(ada_tuned_model, data_test, type = "class")

# Evaluate metrics
ada_tuned_confusion <- confusionMatrix(ada_tuned_preds, data_test$Subscription)
ada_tuned_accuracy <- ada_tuned_confusion$overall['Accuracy']
ada_tuned_precision <- ada_tuned_confusion$byClass['Pos Pred Value']
ada_tuned_recall <- ada_tuned_confusion$byClass['Sensitivity']
ada_tuned_f1 <- 2 * (ada_tuned_precision * ada_tuned_recall) / (ada_tuned_precision + ada_tuned_recall)
ada_tuned_auc <- roc(data_test$Subscription, ada_tuned_probs)$auc

cat(sprintf("\nAdaBoost (Tuned) - Accuracy: %.4f, Precision: %.4f, Recall: %.4f, F1-score: %.4f, AUC: %.4f\n", 
            ada_tuned_accuracy, ada_tuned_precision, ada_tuned_recall, ada_tuned_f1, ada_tuned_auc))

## 
## AdaBoost (Tuned) - Accuracy: 0.9265, Precision: 0.9362, Recall: 0.9873, F1-score: 0.9611, AUC: 0.9038

Experiment 5 vs Experiment 6

Model: AdaBoost (Default) and AdaBoost (Tuned)

What Changed:I adjusted the number of iterations and tree depth in the base learners (iter = 50, maxdepth = 3).

Hyperparameters Found: iter = 50 Controls the number of boosting rounds. maxdepth = 3: Shallow trees as base learners to prevent overfitting.

What I Learned: The tuned AdaBoost model showed slightly lower recall, but similar AUC and precision. This implies: AdaBoost is already very sensitive to overfitting.The default settings were close to optimal, and changing iterations may have affected generalization.

Conclusion:AdaBoost is highly performant even with default parameters. Going forward, I would experiment with learning rate and base learner regularization.

Results and Visualization

Print the result in a table

Objective: Compare model performance across Decision Tree, Random Forest, and AdaBoost variations.

Variation: Models were tuned to assess impact on accuracy and AUC.

Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC.

# Store Results (without xgboost)
results <- tibble(
  Model = c("Decision Tree (Default)", "Decision Tree (Tuned)", 
            "Random Forest (Default)", "Random Forest (Tuned)", 
            "AdaBoost (Default)", "AdaBoost (Tuned)"),
  Accuracy = c(dt_accuracy, dt_tuned_accuracy, rf_accuracy, rf_tuned_accuracy, 
               ada_accuracy, ada_tuned_accuracy),
  Precision = c(dt_precision, dt_tuned_precision, rf_precision, rf_tuned_precision, 
                ada_precision, ada_tuned_precision),
  Recall = c(dt_recall, dt_tuned_recall, rf_recall, rf_tuned_recall, 
             ada_recall, ada_tuned_recall),
  F1_Score = c(dt_f1, dt_tuned_f1, rf_f1, rf_tuned_f1, 
               ada_f1, ada_tuned_f1),
  AUC = c(dt_auc, dt_tuned_auc, rf_auc, rf_tuned_auc, 
          ada_auc, ada_tuned_auc)
)

# Plot AUC Comparison
ggplot(results, aes(x = reorder(Model, AUC), y = AUC, fill = Model)) +
  geom_bar(stat = "identity", color = "black") +
  coord_flip() +
  theme_minimal() +
  labs(title = "AUC Comparison Across Models", x = "Model", y = "AUC")

The table below summarizes the model performance. Random Forest showed the highest AUC, indicating strong predictive capability, while AdaBoost demonstrated balanced performance

# Display Results
print(results)

## # A tibble: 6 × 6
##   Model                   Accuracy Precision Recall F1_Score   AUC
##   <chr>                      <dbl>     <dbl>  <dbl>    <dbl> <dbl>
## 1 Decision Tree (Default)    0.930     0.936  0.991    0.963 0.768
## 2 Decision Tree (Tuned)      0.930     0.936  0.991    0.963 0.768
## 3 Random Forest (Default)    0.928     0.936  0.989    0.962 0.912
## 4 Random Forest (Tuned)      0.927     0.936  0.987    0.961 0.908
## 5 AdaBoost (Default)         0.927     0.936  0.987    0.961 0.903
## 6 AdaBoost (Tuned)           0.927     0.936  0.987    0.961 0.904

Based on the results, Random Forest with tuning is the most effective model for predicting customer subscription to term deposits. The bank should focus on refining Random Forest hyperparameters and combining ensemble models to improve generalization.

Final Takeaways: What I Learned Across All Experiments

Model	Tuning Impact	Insight
Decision Tree	None	Model already balanced; tuning didn’t improve metrics.
Random Forest	Moderate	Tuning mtry improved recall/F1; more stable with ensemble learning.
AdaBoost	Low	Already high performance; tuning iterations didn’t help much.

Conclusion

The objective of this project was to analyze the effectiveness of a bank’s marketing campaign and predict customer subscription to term deposits using three machine learning models: Decision Tree, Random Forest, and AdaBoost. Through systematic experimentation and tuning, the models were evaluated based on key performance metrics, including accuracy, precision, recall, F1-score, and AUC.

Key Findings:Decision Tree: The default Decision Tree model demonstrated high recall but moderate AUC, indicating overfitting. Tuning the complexity parameter (cp) and tree depth improved generalization but did not significantly enhance overall accuracy.

Random Forest: The Random Forest model exhibited strong predictive power with high AUC and balanced performance across accuracy, precision, and recall. Tuning further enhanced model stability and generalization. AdaBoost: AdaBoost achieved competitive performance with high recall and AUC. Increasing the number of boosting rounds improved recall and F1-score but slightly reduced accuracy, suggesting a balance between bias and variance.

Best Model: The tuned Random Forest model emerged as the best-performing model, achieving the highest AUC and consistent predictive accuracy. Its ability to handle complex patterns and reduce overfitting makes it the most reliable model for customer targeting.

Recommendations:The bank should implement the tuned Random Forest model for future marketing campaigns to improve customer targeting and conversion rates. Further improvements can be achieved by exploring hyperparameter adjustments and feature selection to refine model performance. Combining ensemble methods such as AdaBoost and Random Forest may further enhance predictive accuracy and generalization.

Experimentation & Model Training

Waheeb Algabri

Introduction

Getting Started

Load packages

The data

Experimentation & Model Training

Conclusion