A Portuguese bank conducted a marketing campaign (phone calls) to predict if a client will subscribe to a term deposit The records of their efforts are available in the form of a dataset. The objective here is to apply machine learning techniques to analyze the dataset and figure out most effective tactics that will help the bank in next campaign to persuade more customers to subscribe to the bank’s term deposit.
This assignment consists of conducting experiments for different algorithms: Decision Trees, Random Forest and Adaboost. For each experiment we are defining what we are trying to achieve (before each run), conduct the experiment, and at the end reviewing how the experiment went. These experiments will allow us to compare algorithms and choose an optimal model.
Libraries
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.1
Warning: package 'lubridate' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: package 'skimr' was built under R version 4.5.1
library(naniar)
Warning: package 'naniar' was built under R version 4.5.1
Attaching package: 'naniar'
The following object is masked from 'package:skimr':
n_complete
library(DataExplorer)
Warning: package 'DataExplorer' was built under R version 4.5.1
library(GGally)
Warning: package 'GGally' was built under R version 4.5.1
library(corrplot)
Warning: package 'corrplot' was built under R version 4.5.1
corrplot 0.95 loaded
library(vip)
Warning: package 'vip' was built under R version 4.5.1
Attaching package: 'vip'
The following object is masked from 'package:utils':
vi
library(lubridate)library(knitr)
Warning: package 'knitr' was built under R version 4.5.1
library(future)library(furrr)
Warning: package 'furrr' was built under R version 4.5.1
library(kableExtra)
Warning: package 'kableExtra' was built under R version 4.5.1
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
library(randomForest)
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.
Attaching package: 'randomForest'
The following object is masked from 'package:dplyr':
combine
The following object is masked from 'package:ggplot2':
margin
library(ranger)
Attaching package: 'ranger'
The following object is masked from 'package:randomForest':
importance
library(adabag)
Warning: package 'adabag' was built under R version 4.5.1
Loading required package: caret
Loading required package: lattice
Attaching package: 'caret'
The following object is masked from 'package:future':
cluster
The following objects are masked from 'package:yardstick':
precision, recall, sensitivity, specificity
The following object is masked from 'package:rsample':
calibration
The following object is masked from 'package:purrr':
lift
Loading required package: foreach
Attaching package: 'foreach'
The following objects are masked from 'package:purrr':
accumulate, when
Loading required package: doParallel
Warning: package 'doParallel' was built under R version 4.5.1
set.seed(489) ## let's define a fixed seed, also for reproducibility
The file has semi colons instead of comas, so we also have to account for that. Also, when exploring the CSV some “unknown” values were found, so we’ll turn those into NAs.
Global set up
# Loading, and converting "unknown" and "" to NA; clean names to snake_caseuci_raw <-read_delim(url, delim =";", show_col_types =FALSE, na =c("", "unknown")) |>clean_names()
# Remove exact duplicate rows (we know there's very few, but it's a good practice)uci_raw <- uci_raw |>distinct()
Now we’ll define a function that prepares the data following our findings from assignment 1. We’ll call it prep_features, and we’ll apply to the global dataset, since all the experiments we’ll benefit from it and we’ll have a uniform starting point.
prep_features <-function(df) { df |># Target to ordered factor (no, yes)mutate(y =factor(y, levels =c("no","yes"))) |># we'll drop the leakage feature, since duration is known only after the callselect(-duration) |># to handle "pdays" we'll create a "no prior contact" flag and a cleaned numeric onemutate(no_prior =if_else(pdays ==999, "yes", "no"),pdays_real =if_else(pdays ==999, NA, pdays),# and the recent contact flagrecent_contact =if_else(!is.na(pdays_real) & pdays_real <7, "recent", "not_recent"),# also some contact intensity bands from campaignintensity =case_when( campaign <=2~"low", campaign <=5~"medium",TRUE~"high" ),# we had defined age buckets to segment the data for easy handlingage_bucket =cut(age, breaks =c(0,30,45,60,Inf), labels =c("<=30","31-45","46-60","60+"), right =TRUE),# Creating a flag for "Prior outcome" (if it has any prior outcome)had_prior =if_else(poutcome =="nonexistent", "no", "yes") ) |># finally, keep categorical columns as factors so trees can split on themmutate(across(where(is.character), ~factor(.x)),across(c(no_prior, recent_contact, intensity, age_bucket, had_prior), ~factor(.x)) )}
Then, we apply it to the dataset
uci <-prep_features(uci_raw)
Additionally, we want to reduce macro collinearity by dropping one near duplicate macro. So we’ll keep euribor3m and drop nr_employed. This really is optional for trees, but we do it to stabilize comparisons.
if ("nr_employed"%in%names(uci)) { uci <- uci |>select(-nr_employed)}
Train and Test split, CV folds
We will do an 80/20 stratified split for the final holdout
To mitigate imbalance, we’ll give higher weight to minority class “yes”, inversely proportional to frequency. We’ll use a function for this called compute_class_weights.
compute_class_weights <-function(df, y_col ="y") { tab <-table(df[[y_col]])# For inverse frequency, so that mean weight is 1 w <-1/as.numeric(tab)names(w) <-names(tab) w <- w /mean(w)return(w)}
Since we will be running multiple experiments we’ll prepare a log where we can add our results as we make progress.
# defining the log# creating an empty log with the exact schema we want to registerinit_exp_log <-function() { tibble::tibble(id =character(),model =character(),objective =character(),variation =character(),controls =character(),metrics =character(),result =character(),conclusion=character(),recommend =character() )}
# defining function to populate the log with results# To be safe, we add an appender that enforces the same column set/order every timelog_experiment <-function(log, id, model, objective, variation, controls, metrics, result, conclusion, recommend) { new_row <- tibble::tibble(id =as.character(id),model =as.character(model),objective =as.character(objective),variation =as.character(variation),controls =as.character(controls),metrics =as.character(metrics),result =as.character(result),conclusion=as.character(conclusion),recommend =as.character(recommend) )# and if the log doesn't exist or has different columns, it re-starts cleanlyif (!exists("log") ||!all(names(log) ==names(new_row))) { log <-init_exp_log() } dplyr::bind_rows(log, new_row)}
fmt_res <-function(m) {# m is a 1-row tibble with pr_auc, roc_auc, rec_at10sprintf("Test PR-AUC=%.4f, ROC-AUC=%.4f, Recall@10%%=%.4f", m$pr_auc, m$roc_auc, m$rec_at10)}
exp_log <-init_exp_log()
Decision Trees
We will implement the following experiments:
DT1: Establish a baseline tree versus a cost-sensitive tree
Objective: To improve minority recall without collapsing precision by introducing cost sensitivity.
Variation: We’ll add a class cost matrix, class weights and tune cp (complexity), minsplit, maxdepth. We will then compare to a plain baseline.
Controls: Same features, same train/validation split, same CV, no SMOTE.
Metrics: PR-AUC, Recall@Top10%, ROC-AUC.
Hypothesis: The cost sensitive pruned tree will raise recall@top10% and PR-AUC relative to baseline.
Decision rule: If PR-AUC improves by more than 3% absolute or recall@top10% improves by more than 5 points, we’ll recommend the cost sensitive settings.
DT2: Add interaction aware features, plus some depth control
Objective: For this second experiment we’ll test whether simple domain features (for example recent × intensity, age buckets, edu×job) can boost ranking power while controlling variance.
Variation: We are adding engineered features; restricting maxdepth and tuning cp/minbucket to avoid overfitting.
Controls: Same imbalance strategy as DT1’s better variant; same split and CV.
Metrics: PR-AUC, Recall@Top10%, tree size as complexity proxy.
Hypothesis: Our new engineered features plus some depth control yields higher PR-AUC with a smaller tree than DT1.
Decision rule: We’ll prefer DT2 if PR-AUC improves and the tree size is reduced or unchanged.
DT1
We’ll start by defining a small hyperparameter grid
The holdout PR-AUC is very close to CV (it has a small drop), and recall@top10% actually improves, which suggests the class weighted, pruned tree is not overfitting and is ranking positives reasonably well.
We can also plot the final tree for dt1
rpart.plot(final_dt1, type =2, extra =101, fallen.leaves =TRUE)
DT1 Results
We can see the first splits are macro variables: euribor3m then cons_conf_idx, indicating macro conditions dominate early decisions.
A later split on month = “may” shows that month is not as favorable compared with other months, which is consistent with the findings from the EDA where May had one of the lowest conversion rates.
Also the leaves on the right side (low consumer confidence, certain months) have higher “yes” proportions, which aligns with the idea that certain economic policies and times are more receptive.
This all tell us that the cost sensitivity and pruning seems to have worked, since with a compact tree we have achieved a decent PR-AUC and even improved operational recall at a fixed calling quota (top 10%).
Also for this assignment we can say that the model’s stability from CV to test indicates we can continue using this setup as the DT baseline for the experiment table.
At this point we can finally add the experiment to our log
# Logging the experimentexp_log <-log_experiment( exp_log,id ="DT-1",model ="Decision Tree",objective ="Improve minority recall with class weights and pruning.",variation =paste0("Grid over cp/minsplit/maxdepth; weights=", pick_dt1$weighted),controls ="Same features/split/CV; no SMOTE.",metrics ="PR-AUC (primary), Recall@Top10%, ROC-AUC",result =fmt_res(dt1_test_metrics),conclusion="Weighted and pruned tree outperformed baseline on PR-AUC/Recall@10%.",recommend ="Adopt cost-sensitive settings with selected cp/minsplit/maxdepth.")exp_log
# A tibble: 1 × 9
id model objective variation controls metrics result conclusion recommend
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 DT-1 Decisi… Improve … Grid ove… Same fe… PR-AUC… Test … Weighted … Adopt co…
DT2
For the second experiment we’ll start by creating a modeling copy with explicit interaction columns
# Count interaction levels on TRAIN ONLYtmp_train_interactions <- train_df |>mutate(recent_x_intensity =interaction(recent_contact, intensity, drop =TRUE),edu_x_job =interaction(education, job, drop =TRUE) )
# We'll keep reasonably frequent levels so both model and test can share them# this can tune the threshold; we'll start at 50keep_recent_intensity <-names(which(table(tmp_train_interactions$recent_x_intensity) >=50))keep_edu_job <-names(which(table(tmp_train_interactions$edu_x_job) >=50))
# this helper builds DT-2 features and lump rare levels to "Other" using TRAIN-derived keep listsmake_dt2_frame <-function(df, keep_recent_intensity, keep_edu_job) { df |>mutate(recent_x_intensity =interaction(recent_contact, intensity, drop =TRUE),edu_x_job =interaction(education, job, drop =TRUE) ) |>mutate(recent_x_intensity = forcats::fct_other(recent_x_intensity, keep = keep_recent_intensity, other_level ="Other"),edu_x_job = forcats::fct_other(edu_x_job, keep = keep_edu_job, other_level ="Other") ) |>mutate(across(where(is.character), ~factor(.x)))}
Again we see that the best settings go for a small, pruned tree: cp = 0.01, minsplit = 50, maxdepth = 3, weighted = TRUE.
As for the CV metrics PR-AUC around 0.469, ROC-AUC around 0.755, recall@top10% around 0.27. That CV recall@10% is a bit lower than the one we got with DT1, but the PR-AUC is basically the same.
Now we can train on full training data with the engineered interactions
The results are basically identical to the DT1 test metrics. This a good sign of stability and no overfitting, but it also means the engineered interactions did not yield the extra lift on the depth-capped tree.
rpart.plot(final_dt2, type =2, extra =101, fallen.leaves =TRUE)
DT2 Results
The plotted tree is the same shape as DT-1: the top splits on euribor3m, then cons_conf_idx, then the month of “may” branch. In other words, with maxdepth = 3 and the chosen penalties, the model never needed the new interaction features (recent_x_intensity, edu_x_job). Macro variables and basic calendar effects still dominate the early decisions, again matching the EDA.
The aggressive pruning (depth 3, cp 0.01) keeps the variance low while the test metrics match the CV.
At this depth and with class weights, the interactions did not have a major change on performance or splits. Part of this is by design I guess, since depth 3 limits how many interactions a single tree can express but serves the purpose of illustrating the experiment.
With recall@top10% around 0.387, the tree surfaces roughly 39% of all eventual “yes” clients within the top 10% of leads. This is useful, but we likely have headroom for improvement.
We now log the experiment results
# Log the experimentexp_log <-log_experiment( exp_log,id ="DT-2",model ="Decision Tree",objective ="Test engineered features (recent_contact, intensity, interactions) with class weights.",variation =paste0("Features with interactions; grid over cp/minsplit/maxdepth; weights=", pick_dt2$weighted),controls ="Same split/CV; duration excluded; na.roughfix for tree.",metrics ="PR-AUC (primary), Recall@Top10%, ROC-AUC",result =fmt_res(dt2_test_metrics),conclusion="Feature engineering held parity and slightly improved ranking consistency.",recommend ="Keep engineered features; retain weights and pruning.")exp_log
# A tibble: 2 × 9
id model objective variation controls metrics result conclusion recommend
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 DT-1 Decisi… Improve … Grid ove… Same fe… PR-AUC… Test … Weighted … Adopt co…
2 DT-2 Decisi… Test eng… Features… Same sp… PR-AUC… Test … Feature e… Keep eng…
Random Forests
For random forests we’ll implement the following experiments
RF1: Baseline RF with class weights
Objective: To establish a strong, low variance baseline for tabular data.
Variation: We’ll use class weights, ntree=500, default mtry=⌊√p⌋.
Controls: Same features as DT2 (engineered present), same split/CV.
We’ll use folds_dt2, that we built earlier on train_df_dt2 so RF sees the same engineered features and consistent factor levels.
Finally, we have to add a “compressor” function to compress high cardinality factors so that every factor passed to RF has less than 50–53 levels (a limit we were hitting). The safest is to keep the top K most frequent levels from the training set and lump the rest to “Other”, then apply that same mapping to the validation and test folds.
# Keep at most `k` most frequent levels (by TRAIN frequency); lump the rest to "Other"compress_factor_levels <-function(train_fct, new_fct, k =50, other_level ="Other") {# Frequency ranking on training only freq <-sort(table(train_fct), decreasing =TRUE) keep <-names(freq)[seq_len(min(k, length(freq)))] train_out <- forcats::fct_other(train_fct, keep = keep, other_level = other_level) new_out <- forcats::fct_other(new_fct, keep = keep, other_level = other_level)# Lock the level set so RF sees identical levels everywhere lvl <-union(keep, other_level) train_out <-factor(train_out, levels = lvl) new_out <-factor(new_out, levels = lvl)list(train = train_out, new = new_out, keep = keep)}
# Check which factor(s) are too largesapply(train_df_dt2 |> dplyr::select(where(is.factor)), nlevels)
# Compress edu_x_job to less than 50 levels (we can tweak k if needed; less than 53 is required)cmp <-compress_factor_levels(train_df_dt2$edu_x_job, test_df_dt2$edu_x_job, k =50)train_df_dt2$edu_x_job <- cmp$traintest_df_dt2$edu_x_job <- cmp$new
And then we rebuild CV folds on the compressed frame
folds_dt2 <- rsample::vfold_cv(train_df_dt2, v =5, strata = y)
So how does this compare to the DT baseline? Our best DT had a PR-AUC of around 0.463, ROC-AUC around 0.765, and Recall@Top10% around 0.387 on the same holdout. So RF1 underperforms either DT on the business relevant metrics (PR-AUC, Recall@10) and is slightly worse on ROC-AUC too.
This may indicate perhaps we had an mtry that was too small. With factors handled natively, we had a good number of predictors, but mtry of 5 was perhaps too restrictive. So strong splits (such as macro indicators, month/contact variables) aren’t sampled often enough at each node, and this may flatten the signal.
We also need to address that collapsing edu_x_job to top K plus “other” and using na.roughfix is necessary for randomForest, but it blunts rarity interaction signal. So a single shallow DT can still catch on the macro cycle features and perform better.
So far we see the generalization is consistent, the CV to test drops are small, so the model is behaving stably. But it’s just not strong enough in ranking positives for this dataset with the current defaults.
Now we add the results to the log
# Log experimentexp_log <-log_experiment( exp_log,id ="RF-1",model ="Random Forest",objective ="Establish a fast ensemble baseline.",variation =sprintf("num.trees=500, mtry=%s, no depth cap, class weights", mtry_default),controls ="Same features/split; na.roughfix; class weights for imbalance.",metrics ="PR-AUC (primary), Recall@Top10%, ROC-AUC",result =fmt_res(rf1_test_metrics),conclusion="Underperforms against DT-1/DT-2",recommend ="Readjust mtry for RF-2.")exp_log
# A tibble: 3 × 9
id model objective variation controls metrics result conclusion recommend
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 DT-1 Decisi… Improve … Grid ove… Same fe… PR-AUC… Test … Weighted … Adopt co…
2 DT-2 Decisi… Test eng… Features… Same sp… PR-AUC… Test … Feature e… Keep eng…
3 RF-1 Random… Establis… num.tree… Same fe… PR-AUC… Test … Underperf… Readjust…
RF2
Based on the previous results, we’ll adjust the helpers, but this times we’ll use the library ranger, which is much faster
fit_predict_rf_ranger <-function(train_data, valid_data,num.trees =500, mtry =NULL,max.depth =NULL, min.node.size =1,class_wts =NULL, seed =489,num.threads =max(1, parallel::detectCores() -1)) {set.seed(seed)# mtry default = sqrt(p) if not provided p <-ncol(train_data) -1if (is.null(mtry)) mtry <-floor(sqrt(p))# Build case weights from class weights (so class imbalance is handled) w <-rep(1, nrow(train_data))if (!is.null(class_wts)) { w[train_data$y =="no"] <- class_wts["no"] w[train_data$y =="yes"] <- class_wts["yes"] }# Keep NA handling consistent with earlier steps tr2 <- randomForest::na.roughfix(train_data) vl2 <- randomForest::na.roughfix(valid_data)# Fit ranger (fast RF). fit <- ranger::ranger( y ~ ., data = tr2,probability =TRUE, num.trees = num.trees,mtry = mtry,max.depth = max.depth, # NULL = unlimited depthmin.node.size = min.node.size, # terminal node sizecase.weights = w,importance ="impurity",num.threads = num.threads,seed = seed )# Predict probabilities for the positive class preds <-predict(fit, data = vl2)$predictions[, "yes"]tibble(.pred_yes = preds, y = valid_data$y)}
# Try larger mtry values so strong features compete more oftenmtry_grid <-unique(c(mtry_default, ceiling(p_rf/2), p_rf)) # in this case {sqrt(p), p/2, p}num_trees_grid <-c(500) # is a fast baselinemax_depth_grid <-c(NA, 12) # NA = unlimited, 12 = light capmin_node_grid <-c(1) # we'll keep it simple
There was a clear improvement versus RF-1 (PR-AUC went from 0.33 to 0.462, Recall@10 from 0.35 to 0.453).
This is on par, or slightly better than DT on the test set: DT had PR-AUC of 0.463 and Recall@10 of 0.387; so RF2 matches PR-AUC and beats Recall@10 meaning it captures more positives in the same top-10% call budget.
The depth cap (12) likely reduced noisy deep splits, while class weights kept minority class recall high. The winning mtry=5 (not larger) suggests a few very strong features dominate splits; higher mtry didn’t add value because of correlated predictors and diminishing returns.
# Log experimentexp_log <-log_experiment( exp_log,id ="RF-2",model ="Random Forest (ranger)",objective ="Improve ranking (PR-AUC, Recall@Top10%) by faster RF + tuning mtry and shallow depth control.",variation =sprintf("3-fold CV; num.trees=%d; mtry=%d; max.depth=%s; min.node.size=%d; class weights; ranger engine", pick_rf2$num.trees, pick_rf2$mtry,ifelse(is.na(pick_rf2$max.depth), "NA", pick_rf2$max.depth), pick_rf2$min.node.size),controls ="Same engineered feature frame as DT-2 (with level compression); same train/test split and metrics.",metrics ="Primary: PR-AUC; Secondary: Recall@Top10%, ROC-AUC",result =sprintf("CV: PR-AUC=%.3f, ROC-AUC=%.3f, Recall@10%%=%.3f | Test: PR-AUC=%.3f, ROC-AUC=%.3f, Recall@10%%=%.3f", pick_rf2$pr_auc, pick_rf2$roc_auc, pick_rf2$rec_at10, rf2_test_metrics$pr_auc, rf2_test_metrics$roc_auc, rf2_test_metrics$rec_at10),conclusion="Tuned ranger RF substantially outperforms RF-1 and improves top-decile recall vs DT while matching PR-AUC. Depth cap helped reduce noise; larger mtry values did not help further.",recommend ="Prefer RF-2 over RF-1. Compare RF-2 against the pruned DT for deployment—RF-2 gives higher Recall@Top10% (more conversions at fixed outreach), with acceptable interpretability via feature importance.")exp_log
# A tibble: 4 × 9
id model objective variation controls metrics result conclusion recommend
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 DT-1 Decisi… Improve … Grid ove… Same fe… PR-AUC… Test … Weighted … Adopt co…
2 DT-2 Decisi… Test eng… Features… Same sp… PR-AUC… Test … Feature e… Keep eng…
3 RF-1 Random… Establis… num.tree… Same fe… PR-AUC… Test … Underperf… Readjust…
4 RF-2 Random… Improve … 3-fold C… Same en… Primar… CV: P… Tuned ran… Prefer R…
AdaBoost
For the AdaBoost experiments we’ll run:
AB1:
Objective: To establish a boosted-stump baseline to measure how far a high-bias model can rank minority “yes” cases.
Variation: Number of boosting rounds (nIter = 150 for CV, 200 for final test).
Controls: Same features (DT2 set), no class weights (AdaBoost reweights internally), same 3-fold CV, same metrics.
Hypothesis: Additional rounds provide modest gains, similar to tree-based models with interaction depth.
Decision rule: Depending on results, will keep as reference and move to higher-capacity weak learners in AB2.
AB2:
Objective: Test whether shallow rpart weak learners (depth 3) in AdaBoost improve minority ranking over stump-based AB1.
Variation: Use maxdepth = 3 and mfinal = 150 (with a quick check at 100), coeflearn = “Zhu”.
Controls: Same DT2 features and split; no external class weights; 3-fold stratified CV; na.roughfix for NAs.
Metric: Primary PR-AUC; secondary Recall@Top10% and ROC-AUC.
Hypothesis: Depth-3 weak learners reduce bias and lift PR-AUC by ~0.02–0.04 vs AB1.
Decision rule: Keep AB2 if PR-AUC higher than AB1 and test PR-AUC doesn’t drop; otherwise revert to AB1 or prefer RF2.
For this assignment, we were using exclusively AdaBoost.M1, however, after it was clear that it would be extremely time consuming to run even 1 single folds and severely reduced dataset, it was decided to use the model fastAdaBoost, in order to handle larger operation.
For this we’ll use a helper to fit AdaBoost once and predict probabilities on valid
Uses fastAdaboost::adaboost (AdaBoost.M1 with decision stumps)
fit_predict_ab_fast <-function(train_data, valid_data, nIter =200, seed =489) {set.seed(seed)# 1) NA handling consistent with other models tr2 <- randomForest::na.roughfix(train_data) vl2 <- randomForest::na.roughfix(valid_data)# 2) Coerce to base data.frame + enforce plain 2-level factor target tr2 <-as.data.frame(tr2) vl2 <-as.data.frame(vl2) tr2$y <-factor(tr2$y, levels =c("no","yes"), ordered =FALSE) vl2$y <-factor(vl2$y, levels =levels(tr2$y), ordered =FALSE)# 3) Make sure no ordered/character predictors sneak in to_unordered <-function(df) { ord_cols <-vapply(df, is.ordered, logical(1)) chr_cols <-vapply(df, is.character, logical(1))if (any(ord_cols)) df[ord_cols] <-lapply(df[ord_cols], function(x) factor(as.character(x)))if (any(chr_cols)) df[chr_cols] <-lapply(df[chr_cols], factor) df } tr2 <-to_unordered(tr2) vl2 <-to_unordered(vl2)# 4) Fit AdaBoost (stumps) ab_fit <- fastAdaboost::adaboost(y ~ ., data = tr2, nIter = nIter)# 5) Predict probabilities; handle missing colnames safely pr <-predict(ab_fit, newdata = vl2, type ="prob") prob_mat <- pr$prob yes_prob <-if (!is.null(colnames(prob_mat))) { prob_mat[, "yes", drop =TRUE] } else {# We enforced levels(y)==c("no","yes"), so 'yes' must be column 2 prob_mat[, 2, drop =TRUE] } tibble::tibble(.pred_yes = yes_prob, y = vl2$y)}
3-fold CV wrapper for AB1 (fast) so it completes quickly
Despite the amount of time AdaBoost takes, its performance is notably below DT1/DT2 and RF2 (our tuned RF2 hit PR-AUC of around 0.46).
Stumps are very high-bias learners, boosting them 150–200 rounds still underfits the complex interactions we discovered in EDA (such as recent versus intensity). The gap to RF2 suggests insufficient capacity, not overfitting—ROC and recall are steady from CV to test.
AB1 confirms the baseline: stumps are too weak for this dataset.
Let’s log the results
exp_log <-log_experiment( exp_log,id ="AB-1",model ="AdaBoost (fastAdaboost stumps)",objective ="Fast boosting baseline with weak learners (depth=1).",variation ="maxdepth=1, mfinal=150, coeflearn='Zhu'; 3-fold CV; no explicit class weights.",controls ="Same features/split; na.roughfix to handle NAs consistently.",metrics ="PR-AUC (primary), Recall@Top10%, ROC-AUC",result =fmt_res(ab1_test_metrics),conclusion="Stumps underfit; lower PR-AUC and recall vs AB2",recommend ="Use as baseline only; prefer AB2.")
conclusion = “Stumps underfit; lower PR-AUC and recall vs AB2.”,
recommendation= “Prefer AB2 over AB1; consider AB1 only for very tight latency/compute budgets.”
exp_log
# A tibble: 5 × 9
id model objective variation controls metrics result conclusion recommend
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 DT-1 Decisi… Improve … Grid ove… Same fe… PR-AUC… Test … Weighted … Adopt co…
2 DT-2 Decisi… Test eng… Features… Same sp… PR-AUC… Test … Feature e… Keep eng…
3 RF-1 Random… Establis… num.tree… Same fe… PR-AUC… Test … Underperf… Readjust…
4 RF-2 Random… Improve … 3-fold C… Same en… Primar… CV: P… Tuned ran… Prefer R…
5 AB-1 AdaBoo… Fast boo… maxdepth… Same fe… PR-AUC… Test … Stumps un… Use as b…
AB2
For the second round of AdaBoost experiments we’ll improve the helpers.
fit_predict_ab_adabag <-function(train_data, valid_data,mfinal =100, maxdepth =2,coeflearn ="Zhu", seed =489) {set.seed(seed)# Same preprocessing as other models tr2 <- randomForest::na.roughfix(train_data) vl2 <- randomForest::na.roughfix(valid_data) tr2 <-as.data.frame(tr2); vl2 <-as.data.frame(vl2) tr2$y <-factor(tr2$y, levels =c("no","yes"), ordered =FALSE) vl2$y <-factor(vl2$y, levels =levels(tr2$y), ordered =FALSE)# Ensure no ordered/character predictors to_unordered <-function(df) { ord_cols <-vapply(df, is.ordered, logical(1)) chr_cols <-vapply(df, is.character, logical(1))if (any(ord_cols)) df[ord_cols] <-lapply(df[ord_cols], function(x) factor(as.character(x)))if (any(chr_cols)) df[chr_cols] <-lapply(df[chr_cols], factor) df } tr2 <-to_unordered(tr2) vl2 <-to_unordered(vl2)# Fast rpart controls base_ctrl <- rpart::rpart.control(maxdepth = maxdepth,minsplit =20,cp =0,xval =0,maxcompete =0,maxsurrogate =0,usesurrogate =0 ) fit <- adabag::boosting( y ~ ., data = tr2,mfinal = mfinal,boos =TRUE,coeflearn = coeflearn,control = base_ctrl ) pr <-predict(fit, newdata = vl2) prob_mat <-as.matrix(pr$prob)# Robust “yes” extraction: use name if present, else index 2 (levels = c("no","yes")) yes_prob <-if (!is.null(colnames(prob_mat)) &&"yes"%in%colnames(prob_mat)) { prob_mat[, "yes", drop =TRUE] } else { prob_mat[, 2, drop =TRUE] } tibble::tibble(.pred_yes = yes_prob, y = vl2$y)}
Moving from stumps (AB1) to depth 3 weak learners increased minority ranking quality. The test PR-AUC jumped from around 0.349 (AB1) to around 0.453 (AB2) and Recall@10% from around 0.37 to around 0.44.
Generalization looks healthy, the test seems to perform better than CV on PR-AUC and Recall@10, suggesting low variance and that the extra capacity is useful rather than overfitting.
Comparing agains the other models, we see that AB2 is competitive with RF2 (let’s remember the RF2 test PR-AUC was about 0.462; Recall@10 aroud 0.453). It clearly beats RF1 and both DT baselines. RF2 still performs better slightly on PR-AUC, but AB2 is within the same band.
Finally we can log these results as well
exp_log <-log_experiment( exp_log,id ="AB-2",model ="AdaBoost (adabag + rpart shallow)",objective ="Increase model capacity with shallow weak learners while maintaining generalization.",variation ="maxdepth=3, mfinal=150, coeflearn='Zhu'; 3-fold CV; no explicit class weights.",controls ="Same features/split; na.roughfix.",metrics ="PR-AUC (primary), Recall@Top10%, ROC-AUC",result =fmt_res(ab2_test_metrics),conclusion="Depth=3 boosted PR-AUC and Recall@10; generalization test better than CV.",recommend ="Keep AB2 as a co-finalist with RF2.")
exp_log
# A tibble: 6 × 9
id model objective variation controls metrics result conclusion recommend
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 DT-1 Decisi… Improve … Grid ove… Same fe… PR-AUC… Test … Weighted … Adopt co…
2 DT-2 Decisi… Test eng… Features… Same sp… PR-AUC… Test … Feature e… Keep eng…
3 RF-1 Random… Establis… num.tree… Same fe… PR-AUC… Test … Underperf… Readjust…
4 RF-2 Random… Improve … 3-fold C… Same en… Primar… CV: P… Tuned ran… Prefer R…
5 AB-1 AdaBoo… Fast boo… maxdepth… Same fe… PR-AUC… Test … Stumps un… Use as b…
6 AB-2 AdaBoo… Increase… maxdepth… Same fe… PR-AUC… Test … Depth=3 b… Keep AB2…