Modeling

Column

Data source.

Train-test split

exited=0 exited=1
Train
6,376 (79.7%) 1,624 (20.3%)
Test
1,587 (79.3%) 413 (20.6%)

Modeling by Catboost

X <- D_train %>%
    select(-RowNumber, -CustomerId, -Exited)

M1 <- xgboost(data = data.matrix(X), 
              label = D_train$Exited, 
              max.depth = 6, eta = 0.1, nrounds = 250, 
              nthread = 6, objective =  "binary:logistic",
              verbose = 0)

Column

AUC - ROC (of training dataset)

AUC - ROC (of testing dataset)

Metrics of testing dataset

set threshold=0.243
accuracy recall precision
81.9% 90.11% 86.7%

Feature Importance

by SHAP values