We will learn how to tune hyper-parameter on XGBoost model via Customer Churn Prediction exercise.
customer <- read_csv("https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv")
customer <- customer %>%
mutate_if(is.character,factor)
Plot summary of all variables with DataExplorer
library(DataExplorer)
plot_intro(customer)
Summary customer data: (Exclude customerID)
| Name | customer[, -1] |
| Number of rows | 7043 |
| Number of columns | 20 |
| _______________________ | |
| Column type frequency: | |
| factor | 16 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts | pct |
|---|---|---|---|---|---|---|
| gender | 0 | 1 | FALSE | 2 | Mal: 3555, Fem: 3488 | Mal: 0.50, Fem: 0.50 |
| Partner | 0 | 1 | FALSE | 2 | No: 3641, Yes: 3402 | No: 0.52, Yes: 0.48 |
| Dependents | 0 | 1 | FALSE | 2 | No: 4933, Yes: 2110 | No: 0.70, Yes: 0.30 |
| PhoneService | 0 | 1 | FALSE | 2 | Yes: 6361, No: 682 | Yes: 0.90, No: 0.10 |
| MultipleLines | 0 | 1 | FALSE | 3 | No: 3390, Yes: 2971, No : 682 | No: 0.48, Yes: 0.42, No : 0.10 |
| InternetService | 0 | 1 | FALSE | 3 | Fib: 3096, DSL: 2421, No: 1526 | Fib: 0.44, DSL: 0.34, No: 0.22 |
| OnlineSecurity | 0 | 1 | FALSE | 3 | No: 3498, Yes: 2019, No : 1526 | No: 0.50, Yes: 0.29, No : 0.22 |
| OnlineBackup | 0 | 1 | FALSE | 3 | No: 3088, Yes: 2429, No : 1526 | No: 0.44, Yes: 0.34, No : 0.22 |
| DeviceProtection | 0 | 1 | FALSE | 3 | No: 3095, Yes: 2422, No : 1526 | No: 0.44, Yes: 0.34, No : 0.22 |
| TechSupport | 0 | 1 | FALSE | 3 | No: 3473, Yes: 2044, No : 1526 | No: 0.49, Yes: 0.29, No : 0.22 |
| StreamingTV | 0 | 1 | FALSE | 3 | No: 2810, Yes: 2707, No : 1526 | No: 0.40, Yes: 0.38, No : 0.22 |
| StreamingMovies | 0 | 1 | FALSE | 3 | No: 2785, Yes: 2732, No : 1526 | No: 0.40, Yes: 0.39, No : 0.22 |
| Contract | 0 | 1 | FALSE | 3 | Mon: 3875, Two: 1695, One: 1473 | Mon: 0.55, Two: 0.24, One: 0.21 |
| PaperlessBilling | 0 | 1 | FALSE | 2 | Yes: 4171, No: 2872 | Yes: 0.59, No: 0.41 |
| PaymentMethod | 0 | 1 | FALSE | 4 | Ele: 2365, Mai: 1612, Ban: 1544, Cre: 1522 | Ele: 0.34, Mai: 0.23, Ban: 0.22, Cre: 0.22 |
| Churn | 0 | 1 | FALSE | 2 | No: 5174, Yes: 1869 | No: 0.73, Yes: 0.27 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| SeniorCitizen | 0 | 1 | 0.16 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| tenure | 0 | 1 | 32.37 | 24.56 | 0.00 | 9.00 | 29.00 | 55.00 | 72.00 | ▇▃▃▃▆ |
| MonthlyCharges | 0 | 1 | 64.76 | 30.09 | 18.25 | 35.50 | 70.35 | 89.85 | 118.75 | ▇▅▆▇▅ |
| TotalCharges | 11 | 1 | 2283.30 | 2266.77 | 18.80 | 401.45 | 1397.47 | 3794.74 | 8684.80 | ▇▂▂▂▁ |
A quick Data Exploration
plot_bar(customer[,-1], by = "Churn")
library(ggstatsplot)
ggcorrmat(customer,
title = "Relationship between numeric variables",
subtitle = "Dataset: Telco Customer",
ggcorrplot.args = list(outline.color = "black", hc.order = TRUE))
customer <- customer %>%
na.omit() %>%
mutate_at(vars(SeniorCitizen),factor) %>%
select(-customerID, - TotalCharges)
Splitting and assigning data set to training and testing. Then
creating resamples data from training data set - using cross
validation vfold-cv() :
library(tidymodels)
set.seed(123)
cust_split <- customer %>%
initial_split(prop = 0.8, strata = Churn)
train <- training(cust_split)
test <- testing(cust_split)
# Cross validation folds from training dataset
set.seed(234)
folds <- vfold_cv(train, strata = Churn)
Let’s create a base recipe()
cust_rec <- recipe(Churn ~., data = train) %>%
# update_role(customerID, new_role = "ID") %>%
# step_corr(all_numeric()) %>%
step_corr(all_numeric(), threshold = 0.7, method = "spearman") %>%
step_zv(all_numeric()) %>% # filter zero variance
step_normalize(all_numeric()) %>%
step_dummy(all_nominal_predictors())
Let’s start by building the model without any hyperparameter tuning. Instead, we will use typically recommended values for our hyperparameters.
Let’s create a base XGBoost model specification and workflow:
xgb_spec0 <- boost_tree() %>%
set_engine("xgboost") %>%
set_mode("classification")
xgb_wf0 <-
workflow() %>%
add_recipe(cust_rec) %>%
add_model(xgb_spec0)
Fit model with resamples
# XGBoost
xgb_fit0 <- fit_resamples(xgb_wf0,
resamples = folds,
metrics = metric_set(accuracy, roc_auc, sens,spec),
control = control_resamples(save_pred = TRUE))
Performance metrics
xgb_fit0 %>%
collect_metrics()
## # A tibble: 4 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 accuracy binary 0.797 10 0.00528 Preprocessor1_Model1
## 2 roc_auc binary 0.842 10 0.00498 Preprocessor1_Model1
## 3 sens binary 0.895 10 0.00679 Preprocessor1_Model1
## 4 spec binary 0.526 10 0.00949 Preprocessor1_Model1
Final fix to the whole training set and eveluata on test set
xgb_final <- last_fit(
xgb_wf0,
split = cust_split,
metrics = metric_set(accuracy, roc_auc, sens,spec)
)
Performance Metrics on unseen testing data
xgb_final %>%
collect_metrics()
## # A tibble: 4 × 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.802 Preprocessor1_Model1
## 2 sens binary 0.894 Preprocessor1_Model1
## 3 spec binary 0.548 Preprocessor1_Model1
## 4 roc_auc binary 0.841 Preprocessor1_Model1
Confusion matrix
xgb_final %>%
collect_predictions() %>%
conf_mat(Churn,.pred_class)
## Truth
## Prediction No Yes
## No 924 169
## Yes 109 205
Then we will build model with tuning hyper parameters:
Create XGBoost model specification with hyper parameters we plan to tune:
# Setup a model specification
xgb_spec <-boost_tree(
trees = 500,
tree_depth = tune(),
min_n = tune(),
loss_reduction = tune(), ## first three: model complexity
sample_size = tune(), mtry = tune(), ## randomness
learn_rate = tune() ## step size
) %>%
set_engine("xgboost") %>%
set_mode("classification")
xgb_spec
## Boosted Tree Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 500
## min_n = tune()
## tree_depth = tune()
## learn_rate = tune()
## loss_reduction = tune()
## sample_size = tune()
##
## Computational engine: xgboost
# Passing to workflow formula and Model specification
xgb_wf <- workflow() %>%
add_formula(Churn ~.) %>%
add_model(xgb_spec)
xgb_wf
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Formula
## Model: boost_tree()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## Churn ~ .
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Boosted Tree Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 500
## min_n = tune()
## tree_depth = tune()
## learn_rate = tune()
## loss_reduction = tune()
## sample_size = tune()
##
## Computational engine: xgboost
Set up grids search parameter with
grid_latin_hypercube() - a function with space-filling
parameter approach to find the best tuning
xgb_grid <- grid_latin_hypercube(
tree_depth(),
min_n(),
loss_reduction(),
sample_size = sample_prop(),
finalize(mtry(), train),
learn_rate(),
size = 20
)
## # A tibble: 20 × 6
## tree_depth min_n loss_reduction sample_size mtry learn_rate
## <int> <int> <dbl> <dbl> <int> <dbl>
## 1 8 32 8.88e- 7 0.937 9 4.81e- 8
## 2 6 5 4.53e-10 0.257 8 4.00e- 4
## 3 4 36 5.16e- 3 0.139 3 2.20e-10
## 4 14 2 5.16e- 9 0.334 11 1.31e- 9
## 5 11 18 4.93e+ 0 0.693 6 1.04e- 3
## 6 12 40 1.79e- 4 0.455 13 2.25e- 2
## 7 10 13 1.25e- 5 0.898 12 1.67e- 7
## 8 14 19 1.18e+ 1 0.677 3 2.21e- 6
## 9 2 37 1.87e- 3 0.555 12 9.63e- 8
## 10 5 17 2.16e- 1 0.784 14 3.08e- 3
## 11 4 11 9.24e- 8 0.762 10 5.48e- 3
## 12 6 27 1.29e- 1 0.207 16 4.84e-10
## 13 10 22 1.21e- 2 0.635 4 6.43e- 9
## 14 7 34 2.49e- 6 0.398 5 2.32e- 5
## 15 12 28 1.31e+ 0 0.547 2 2.69e- 5
## 16 9 30 4.87e- 8 0.983 16 5.27e- 6
## 17 3 14 3.45e- 4 0.162 15 5.67e- 7
## 18 2 7 3.46e- 5 0.323 18 4.11e- 9
## 19 9 24 2.09e-10 0.490 17 1.56e- 4
## 20 13 9 1.08e- 8 0.863 6 6.10e- 2
We use tune_grid() function with the workflow, resamples
on training set to tune grid parameter:
library(finetune)
doParallel::registerDoParallel()
set.seed(234)
xgb_res <-tune_grid(
xgb_wf,
resamples = folds,
grid = xgb_grid,
control = control_grid(save_pred = TRUE)
)
xgb_res
## # Tuning results
## # 10-fold cross-validation using stratification
## # A tibble: 10 × 5
## splits id .metrics .notes .predictions
## <list> <chr> <list> <list> <list>
## 1 <split [5062/563]> Fold01 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 2 <split [5062/563]> Fold02 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 3 <split [5062/563]> Fold03 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 4 <split [5062/563]> Fold04 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 5 <split [5062/563]> Fold05 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 6 <split [5063/562]> Fold06 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 7 <split [5063/562]> Fold07 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 8 <split [5063/562]> Fold08 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 9 <split [5063/562]> Fold09 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
## 10 <split [5063/562]> Fold10 <tibble [40 × 10]> <tibble [0 × 3]> <tibble>
Display our tuning parameters:
xgb_res %>%
collect_metrics() %>%
filter(.metric == "roc_auc") %>%
select(mean, mtry:sample_size) %>%
pivot_longer(mtry:sample_size,
names_to = "parameter",
values_to = "value") %>%
ggplot(aes(value, mean, color = parameter)) +
geom_point(show.legend = FALSE)+
facet_wrap(~parameter, scales = "free_x")
Or we can show_best hyper parameter or
select_best parameter
show_best(xgb_res, "roc_auc")
## # A tibble: 5 × 12
## mtry min_n tree_depth learn_rate loss_r…¹ sampl…² .metric .esti…³ mean n
## <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int>
## 1 13 40 12 2.25e-2 1.79e- 4 0.455 roc_auc binary 0.848 10
## 2 14 17 5 3.08e-3 2.16e- 1 0.784 roc_auc binary 0.848 10
## 3 10 11 4 5.48e-3 9.24e- 8 0.762 roc_auc binary 0.847 10
## 4 12 13 10 1.67e-7 1.25e- 5 0.898 roc_auc binary 0.846 10
## 5 8 5 6 4.00e-4 4.53e-10 0.257 roc_auc binary 0.844 10
## # … with 2 more variables: std_err <dbl>, .config <chr>, and abbreviated
## # variable names ¹loss_reduction, ²sample_size, ³.estimator
best_auc <- select_best(xgb_res, "roc_auc")
best_auc
## # A tibble: 1 × 7
## mtry min_n tree_depth learn_rate loss_reduction sample_size .config
## <int> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 13 40 12 0.0225 0.000179 0.455 Preprocessor1_Mo…
Then we will use values from best_auc to finalize model :
final_xgb <- finalize_workflow(xgb_wf, best_auc)
final_xgb
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Formula
## Model: boost_tree()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## Churn ~ .
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Boosted Tree Model Specification (classification)
##
## Main Arguments:
## mtry = 13
## trees = 500
## min_n = 40
## tree_depth = 12
## learn_rate = 0.0224765893635175
## loss_reduction = 0.000179217810199367
## sample_size = 0.454792870174861
##
## Computational engine: xgboost
Check model importance:
# extract_fit_parsnip()
library(vip)
final_xgb %>%
fit(data = train) %>%
extract_fit_parsnip() %>%
vip(geom = "point")
Now, with last_fit() function, we will fit our final
model to whole training data set and evaluate on unseen testing
data. The Performance metric as below
final_rs <- last_fit(final_xgb, cust_split,
metrics = metric_set(accuracy, roc_auc, sens,spec))
final_rs %>%
collect_metrics()
## # A tibble: 4 × 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.810 Preprocessor1_Model1
## 2 sens binary 0.910 Preprocessor1_Model1
## 3 spec binary 0.535 Preprocessor1_Model1
## 4 roc_auc binary 0.850 Preprocessor1_Model1
Our results indicate an increase on both accuracy and roc_auc with tuning hyper-parameter.
Confusion matrix
final_rs %>%
collect_predictions() %>%
conf_mat(Churn, .pred_class)
## Truth
## Prediction No Yes
## No 940 174
## Yes 93 200
ROC Curve
final_rs %>%
collect_predictions() %>%
roc_curve(Churn, .pred_Yes, event_level="second") %>%
autoplot()