Goal is to automate building and tuning a classification model to predict employee attrition, using the h2o::h2o.automl.

Set up

Import data

Import the cleaned data from Module 7.

library(h2o)
## 
## ----------------------------------------------------------------------
## 
## Your next step is to start H2O:
##     > h2o.init()
## 
## For H2O package documentation, ask for help:
##     > ??h2o
## 
## After starting H2O, you can use the Web UI at http://localhost:54321
## For more information visit https://docs.h2o.ai
## 
## ----------------------------------------------------------------------
## 
## Attaching package: 'h2o'
## The following objects are masked from 'package:stats':
## 
##     cor, sd, var
## The following objects are masked from 'package:base':
## 
##     &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
##     colnames<-, ifelse, is.character, is.factor, is.numeric, log,
##     log10, log1p, log2, round, signif, trunc
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::day()   masks h2o::day()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ lubridate::hour()  masks h2o::hour()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ lubridate::month() masks h2o::month()
## ✖ lubridate::week()  masks h2o::week()
## ✖ lubridate::year()  masks h2o::year()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.4.1 ──
## ✔ broom        1.0.12     ✔ rsample      1.3.2 
## ✔ dials        1.4.2      ✔ tailor       0.1.0 
## ✔ infer        1.1.0      ✔ tune         2.0.1 
## ✔ modeldata    1.5.1      ✔ workflows    1.3.0 
## ✔ parsnip      1.4.1      ✔ workflowsets 1.1.1 
## ✔ recipes      1.3.1      ✔ yardstick    1.3.2 
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ recipes::fixed()  masks stringr::fixed()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step()   masks stats::step()
library(tidyquant)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## ── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.12 ──
## ✔ PerformanceAnalytics 2.0.8      ✔ TTR                  0.24.4
## ✔ quantmod             0.4.28     ✔ xts                  0.14.2── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
## ✖ zoo::as.Date()                 masks base::as.Date()
## ✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
## ✖ scales::col_factor()           masks readr::col_factor()
## ✖ lubridate::day()               masks h2o::day()
## ✖ scales::discard()              masks purrr::discard()
## ✖ dplyr::filter()                masks stats::filter()
## ✖ xts::first()                   masks dplyr::first()
## ✖ recipes::fixed()               masks stringr::fixed()
## ✖ lubridate::hour()              masks h2o::hour()
## ✖ dplyr::lag()                   masks stats::lag()
## ✖ xts::last()                    masks dplyr::last()
## ✖ PerformanceAnalytics::legend() masks graphics::legend()
## ✖ TTR::momentum()                masks dials::momentum()
## ✖ lubridate::month()             masks h2o::month()
## ✖ yardstick::spec()              masks readr::spec()
## ✖ quantmod::summary()            masks h2o::summary(), base::summary()
## ✖ lubridate::week()              masks h2o::week()
## ✖ lubridate::year()              masks h2o::year()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read_csv("../00 data/data_wrangled/data_clean.csv") %>%
    
    # h2o requires all variables to be either numeric or factors
    mutate(across(where(is.character), factor))
## Rows: 1899 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, category, other_colors, short_description, designer
## dbl (5): item_id, price, depth, height, width
## lgl (1): sellable_online
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Split

set.seed(1234)

data_split <- initial_split(data, strata = "sellable_online")
train_tbl <- training(data_split)
test_tbl <- testing(data_split)

Recipes

recipe_obj <- recipe(sellable_online ~ ., data = train_tbl) %>%
    

    step_zv(all_predictors()) 

Model

h2o.init()
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 days 2 hours 
##     H2O cluster timezone:       America/New_York 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.44.0.3 
##     H2O cluster version age:    2 years, 4 months and 8 days 
##     H2O cluster name:           H2O_started_from_R_tyler_fhp551 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.22 GB 
##     H2O cluster total cores:    10 
##     H2O cluster allowed cores:  10 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 4.5.2 (2025-10-31)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is (2 years, 4 months and 8 days) old. There may be a newer version available.
## Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html
split.h2o <- h2o.splitFrame(as.h2o(train_tbl), ratios = c(0.85), seed = 2345)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
train_h2o <- split.h2o[[1]]
valid_h2o <- split.h2o[[2]]
test_h2o  <- as.h2o(test_tbl)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
y <- "sellable_online"
x <- setdiff(names(train_tbl), y)

models_h2o <- h2o.automl(
    x = x,
    y = y, 
    training_frame    = train_h2o,
    validation_frame  = valid_h2o, 
    leaderboard_frame = test_h2o, 
    # max_runtime_secs  = 30, 
    max_models        = 10, 
    exclude_algos     = "DeepLearning",
    nfolds            = 5, 
    seed              = 3456   
)
##   |                                                                              |                                                                      |   0%  |                                                                              |===                                                                   |   4%
## 11:17:32.906: User specified a validation frame with cross-validation still enabled. Please note that the models will still be validated using cross-validation only, the validation frame will be used to provide purely informative validation metrics on the trained models.
## 11:17:32.907: AutoML: XGBoost is not available; skipping it.  |                                                                              |=======================                                               |  33%  |                                                                              |======================================================================| 100%
models_h2o %>% typeof()
## [1] "S4"
models_h2o %>% slotNames()
## [1] "project_name"   "leader"         "leaderboard"    "event_log"     
## [5] "modeling_steps" "training_info"
models_h2o@leaderboard
##                                                   model_id       auc    logloss
## 1                          GBM_4_AutoML_16_20260429_111732 0.9812853 0.04432942
## 2    StackedEnsemble_AllModels_1_AutoML_16_20260429_111732 0.9724576 0.02953884
## 3                          GBM_1_AutoML_16_20260429_111732 0.9706921 0.04081237
## 4 StackedEnsemble_BestOfFamily_1_AutoML_16_20260429_111732 0.9661017 0.02688974
## 5                          GLM_1_AutoML_16_20260429_111732 0.9512712 0.02525709
## 6             GBM_grid_1_AutoML_16_20260429_111732_model_1 0.9382062 0.04498623
##       aucpr mean_per_class_error       rmse         mse
## 1 0.9998785            0.5000000 0.07944969 0.006312253
## 2 0.9998218            0.3333333 0.07648578 0.005850074
## 3 0.9998106            0.5000000 0.07926326 0.006282665
## 4 0.9997768            0.3333333 0.07439248 0.005534241
## 5 0.9996682            0.3333333 0.06889723 0.004746829
## 6 0.9995773            0.5000000 0.07938414 0.006301841
## 
## [12 rows x 7 columns]
models_h2o@leader
## Model Details:
## ==============
## 
## H2OBinomialModel: gbm
## Model ID:  GBM_4_AutoML_16_20260429_111732 
## Model Summary: 
##   number_of_trees number_of_internal_trees model_size_in_bytes min_depth
## 1              35                       35               35418         1
##   max_depth mean_depth min_leaves max_leaves mean_leaves
## 1        10    8.94286          2         30    17.60000
## 
## 
## H2OBinomialMetrics: gbm
## ** Reported on training data. **
## 
## MSE:  0.0006697189
## RMSE:  0.02587893
## LogLoss:  0.002641406
## Mean Per-Class Error:  0
## AUC:  1
## AUCPR:  1
## Gini:  1
## R^2:  0.9092853
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        FALSE TRUE    Error     Rate
## FALSE      9    0 0.000000     =0/9
## TRUE       0 1201 0.000000  =0/1201
## Totals     9 1201 0.000000  =0/1210
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold       value idx
## 1                       max f1  0.465402    1.000000 117
## 2                       max f2  0.465402    1.000000 117
## 3                 max f0point5  0.465402    1.000000 117
## 4                 max accuracy  0.465402    1.000000 117
## 5                max precision  0.999799    1.000000   0
## 6                   max recall  0.465402    1.000000 117
## 7              max specificity  0.999799    1.000000   0
## 8             max absolute_mcc  0.465402    1.000000 117
## 9   max min_per_class_accuracy  0.465402    1.000000 117
## 10 max mean_per_class_accuracy  0.465402    1.000000 117
## 11                     max tns  0.999799    9.000000   0
## 12                     max fns  0.999799 1200.000000   0
## 13                     max fps  0.008310    9.000000 126
## 14                     max tps  0.465402 1201.000000 117
## 15                     max tnr  0.999799    1.000000   0
## 16                     max fnr  0.999799    0.999167   0
## 17                     max fpr  0.008310    1.000000 126
## 18                     max tpr  0.465402    1.000000 117
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
## H2OBinomialMetrics: gbm
## ** Reported on validation data. **
## ** Validation metrics **
## 
## MSE:  0.004685682
## RMSE:  0.06845205
## LogLoss:  0.03068142
## Mean Per-Class Error:  0.5
## AUC:  0.9953052
## AUCPR:  0.999978
## Gini:  0.9906103
## R^2:  -0.007443735
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        FALSE TRUE    Error    Rate
## FALSE      0    1 1.000000    =1/1
## TRUE       0  213 0.000000  =0/213
## Totals     0  214 0.004673  =1/214
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold      value idx
## 1                       max f1  0.922976   0.997658 112
## 2                       max f2  0.922976   0.999062 112
## 3                 max f0point5  0.998608   0.999057 110
## 4                 max accuracy  0.998608   0.995327 110
## 5                max precision  0.999850   1.000000   0
## 6                   max recall  0.922976   1.000000 112
## 7              max specificity  0.999850   1.000000   0
## 8             max absolute_mcc  0.998608   0.705445 110
## 9   max min_per_class_accuracy  0.998608   0.995305 110
## 10 max mean_per_class_accuracy  0.998608   0.997653 110
## 11                     max tns  0.999850   1.000000   0
## 12                     max fns  0.999850 212.000000   0
## 13                     max fps  0.998392   1.000000 111
## 14                     max tps  0.922976 213.000000 112
## 15                     max tnr  0.999850   1.000000   0
## 16                     max fnr  0.999850   0.995305   0
## 17                     max fpr  0.998392   1.000000 111
## 18                     max tpr  0.922976   1.000000 112
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
## H2OBinomialMetrics: gbm
## ** Reported on cross-validation data. **
## ** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **
## 
## MSE:  0.004013879
## RMSE:  0.06335518
## LogLoss:  0.02194042
## Mean Per-Class Error:  0.2777778
## AUC:  0.9689611
## AUCPR:  0.9997425
## Gini:  0.9379221
## R^2:  0.4563123
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        FALSE TRUE    Error     Rate
## FALSE      4    5 0.555556     =5/9
## TRUE       0 1201 0.000000  =0/1201
## Totals     4 1206 0.004132  =5/1210
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold       value idx
## 1                       max f1  0.122051    0.997923 205
## 2                       max f2  0.122051    0.999168 205
## 3                 max f0point5  0.807971    0.997670 201
## 4                 max accuracy  0.807971    0.995868 201
## 5                max precision  0.999995    1.000000   0
## 6                   max recall  0.122051    1.000000 205
## 7              max specificity  0.999995    1.000000   0
## 8             max absolute_mcc  0.807971    0.705047 201
## 9   max min_per_class_accuracy  0.999022    0.888889 142
## 10 max mean_per_class_accuracy  0.998719    0.931539 172
## 11                     max tns  0.999995    9.000000   0
## 12                     max fns  0.999995 1199.000000   0
## 13                     max fps  0.004330    9.000000 209
## 14                     max tps  0.122051 1201.000000 205
## 15                     max tnr  0.999995    1.000000   0
## 16                     max fnr  0.999995    0.998335   0
## 17                     max fpr  0.004330    1.000000 209
## 18                     max tpr  0.122051    1.000000 205
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
## Cross-Validation Metrics Summary: 
##                             mean       sd cv_1_valid cv_2_valid cv_3_valid
## accuracy                0.996694 0.003457   1.000000   0.991735   1.000000
## auc                     0.898312 0.212447   1.000000   0.974895   1.000000
## err                     0.003306 0.003457   0.000000   0.008264   0.000000
## err_count               0.800000 0.836660   0.000000   2.000000   0.000000
## f0point5                0.997341 0.002782   1.000000   0.993350   1.000000
## f1                      0.998335 0.001743   1.000000   0.995833   1.000000
## f2                      0.999333 0.000699   1.000000   0.998329   1.000000
## lift_top_group          1.007510 0.004579   1.000000   1.012552   1.008333
## logloss                 0.024430 0.025156   0.011304   0.064126   0.001196
## max_per_class_error     0.333333 0.311805   0.000000   0.666667   0.000000
## mcc                     0.746556 0.179844         NA   0.574950   1.000000
## mean_per_class_accuracy 0.833333 0.155902   1.000000   0.666667   1.000000
## mean_per_class_error    0.166667 0.155902   0.000000   0.333333   0.000000
## mse                     0.004038 0.002854   0.003585   0.008280   0.000248
## pr_auc                  0.997281 0.005894   1.000000   0.999682   1.000000
## precision               0.996680 0.003472   1.000000   0.991701   1.000000
## r2                          -Inf       NA       -Inf   0.323730   0.969708
## recall                  1.000000 0.000000   1.000000   1.000000   1.000000
## rmse                    0.058743 0.027098   0.059875   0.090992   0.015757
## specificity             0.583333 0.288675         NA   0.333333   1.000000
##                         cv_4_valid cv_5_valid
## accuracy                  0.995868   0.995868
## auc                       0.518750   0.997917
## err                       0.004132   0.004132
## err_count                 1.000000   1.000000
## f0point5                  0.996678   0.996678
## f1                        0.997921   0.997921
## f2                        0.999167   0.999167
## lift_top_group            1.008333   1.008333
## logloss                   0.033669   0.011855
## max_per_class_error       0.500000   0.500000
## mcc                       0.705638   0.705638
## mean_per_class_accuracy   0.750000   0.750000
## mean_per_class_error      0.250000   0.250000
## mse                       0.004168   0.003911
## pr_auc                    0.986741   0.999983
## precision                 0.995851   0.995851
## r2                        0.491497   0.522880
## recall                    1.000000   1.000000
## rmse                      0.064558   0.062534
## specificity               0.500000   0.500000

Save

?h2o.getModel
?h2o.saveModel
?h2o.loadModel
best_model <- models_h2o@leader

saved_path <- h2o.saveModel(
  object = best_model,
  path = "h2o_models",
  force = TRUE
)

best_model <- h2o.loadModel(saved_path)

Predictions

predictions <- h2o.predict(best_model, newdata = test_h2o)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
## Warning in doTryCatch(return(expr), name, parentenv, handler): Test/Validation
## dataset column 'name' has levels not trained on: ["ALGOT / BROR", "EKENÄS",
## "ENETRI", "FALHOLMEN", "FEJAN", "GAMLEBY", "GENEVAD", "GERSBY", "GRUNDTAL",
## "GRÖNADAL", ...13 not listed..., "PAX / MEHAMN/AULI", "PAX / MEHAMN/SEKKEN",
## "PAX / VINGROM", "PAX / VINTERBRO", "RÅDVIKEN", "SKARPÖ", "SKRUVSTA",
## "STOCKHOLM 2017", "SVENARNE", "TÄRNÖ"]
## Warning in doTryCatch(return(expr), name, parentenv, handler): Test/Validation
## dataset column 'short_description' has levels not trained on: ["1 section,
## 84x40x216 cm", "1 section/shelves/drawers, 48x30x179 cm", "2 sections,
## 165x55x126 cm", "2 sections/shelves, 174x50x124 cm", "2-seat modular sofa,
## outdoor, 161x82x84 cm", "3 sections/shelves, 219x30x226 cm", "3
## sections/shelves, 259x30x226 cm", "3 sections/shelves, 259x50x226 cm", "3
## sections/shelves/cabinet, 259x30x124 cm", "3-seat modular sofa", ...134 not
## listed..., "Wardrobe combination, 250x60x236 cm", "Wardrobe with 2 doors+3
## drawers, 160x42x181 cm", "Wardrobe with 3 doors, 117x176 cm", "Wardrobe with 7
## doors+3 drawers, 240x57x221 cm", "Wardrobe, 175-200x57x251 cm", "Wardrobe,
## 175x58x201 cm", "Wardrobe, 240x57x123 cm", "Wardrobe, 240x57x251 cm", "Window
## table, 80x40x75 cm", "Wing chair"]
## Warning in doTryCatch(return(expr), name, parentenv, handler): Test/Validation
## dataset column 'designer' has levels not trained on: ["104.246.21 KNOPPARP sofa
## is very durable thanks to the metal construction and strong supporting
## fabric.Thanks to the innovative construction, we can use less materials and
## foam when we make KNOPPARP sofa, while the padded cover ensures that the
## comfort is maintained.A sofa with small, neat dimensions which is easy to
## furnish with, even when space is limited. This cover is made from KNISA fabric
## in polyester, which is dope-dyed. It’s a durable material which has a soft
## feel.The dope-dyeing process reduces consumption of water and dyestuff compared
## to traditional dyeing techniques.The cover is easy to keep clean as it is
## removable and can be machine washed.Easy to bring home if you choose to carry
## it on your own. The packaging is just over one metre in height and weighs 17
## kg.10 year guarantee. Read about the terms in the guarantee brochure.This
## cover’s ability to resist abrasion has been tested to handle 40,000 cycles.
## 15,000 cycles or more is suitable for furniture used every day at home. Over
## 30,000 cycles means a good ability to resist abrasion.The cover has a
## lightfastness level of 5-6 (the ability to resist colour fading) on a scale of
## 1 to 8. According to industry standards, a lightfastness level of 4 or higher
## is suitable for home use.", "193.254.57 The cover is easy to keep clean as it
## is removable and can be machine washed.", "392.873.98 Glass doors keep your
## favourite items free from dust but still visible.Adjustable shelves; adapt
## space between shelves according to your needs.Adjustable hinges allow you to
## adjust the door horizontally and vertically.This furniture must be fixed to the
## wall with the enclosed wall fastener.Handle with care! A damaged edge or
## scratched surface can cause the glass to suddenly crack and/or break. Avoid
## collisions from the side - this is where the glass is most vulnerable.Different
## wall materials require different types of fixing devices. Use fixing devices
## suitable for the walls in your home, sold separately.Min. ceiling height
## required: 205 cm.1 fixed shelf and 4 adjustable shelves included.May be
## completed with BILLY height extension unit in the same width for added storage
## vertically.", "404.728.61 Height adjustable armchair which you can swivel to
## the desired height.Slim lines, easy to place.You sit comfortably since the
## chair is adjustable in height.The safety castors have a pressure-sensitive
## brake mechanism that keeps the chair in place when you stand up, and releases
## automatically when you sit down.This product has been developed and tested for
## domestic use.", "602.957.06 With a media shelf you can make the most of the
## wall area, while freeing up space on the floor.Different wall materials require
## different types of fixing devices. Use fixing devices suitable for the walls in
## your home, sold separately.", "704.655.38 You sit comfortably thanks to the
## restful flexibility of the seat.You sit comfortably thanks to the padded
## seat.Velvet.The velvet reflects light in a characteristic way which may make
## the colour appear as if it changes.", "802.945.03 It’s easy to keep the cables
## from your TV and other devices out of sight but close at hand, as there are
## several cable outlets at the back of the TV bench.You can choose to stand the
## TV bench on the floor or mount it on the wall to free up floor space.If you
## want to organise inside you can complement with BESTÅ interior fittings.Steady
## also on uneven floors, thanks to the adjustable feet.This furniture must be
## fixed to the wall with the enclosed wall fastener.This TV bench can take a max
## load of 50 kg on the top.Different wall materials require different types of
## fixing devices. Use fixing devices suitable for the walls in your home, sold
## separately.May be completed with STALLARP, STUBBARP or NANNARP legs. This TV
## bench requires 4 legs and 1 BESTÅ supporting leg.May be completed with SULARP
## legs. This TV bench requires 2 legs and 1 BESTÅ supporting leg.", "904.710.86
## The chair legs are made of solid wood, which is a durable natural material.You
## sit comfortably thanks to the high back and seat with polyester wadding.For
## increased stability, re-tighten the screws about two weeks after assembly and
## when necessary.This chair has been tested for home use and meets the
## requirements for durability and safety, set forth in the following standards:
## EN 12520 and EN 1022.", "Carina Bengs/IKEA of Sweden", "Ebba Strandmark/IKEA of
## Sweden/Ola Wihlborg/Ehlén Johansson", "Ehlén Johansson/Andreas Fredriksson/IKEA
## of Sweden", "Ehlén Johansson/Francis Cayouette/IKEA of Sweden", "Eva Lilja
## Löwenhielm/Jonas Hultqvist/IKEA of Sweden", "Francis Cayouette/IKEA of Sweden",
## "Francis Cayouette/IKEA of Sweden/Ehlén Johansson", "IKEA of Sweden/Carina
## Bengs", "IKEA of Sweden/Ehlén Johansson/Ebba Strandmark", "Lisa Hilland",
## "Magnus Elebäck", "Noboru Nakamura/IKEA of Sweden"]
predictions_tbl <- predictions %>%
    as_tibble()

predictions_tbl %>%
    bind_cols(test_tbl)
## # A tibble: 475 × 14
##    predict   FALSE. TRUE.  item_id name       category     price sellable_online
##    <fct>      <dbl> <dbl>    <dbl> <fct>      <fct>        <dbl> <lgl>          
##  1 TRUE    0.000277 1.000 80155205 STIG       Bar furnitu…  4.23 TRUE           
##  2 TRUE    0.000277 1.000 30180504 NORBERG    Bar furnitu…  5.42 TRUE           
##  3 TRUE    0.000277 1.000 10122647 INGOLF     Bar furnitu…  5.84 TRUE           
##  4 TRUE    0.000277 1.000   121766 INGOLF     Bar furnitu…  5.98 TRUE           
##  5 TRUE    0.000277 1.000   397736 NORRARYD   Bar furnitu…  5.98 TRUE           
##  6 TRUE    0.000277 1.000 50420329 FREKVENS   Bar furnitu…  5.18 TRUE           
##  7 TRUE    0.000277 1.000 70246089 JANINGE    Bar furnitu…  6.39 TRUE           
##  8 TRUE    0.000277 1.000 30352246 RÅSKOG     Bar furnitu…  5.16 TRUE           
##  9 FALSE   0.0771   0.923 10400540 EKEDALEN   Bar furnitu…  5.84 TRUE           
## 10 TRUE    0.000277 1.000 90319918 HENRIKSDAL Bar furnitu…  6.54 TRUE           
## # ℹ 465 more rows
## # ℹ 6 more variables: other_colors <fct>, short_description <fct>,
## #   designer <fct>, depth <dbl>, height <dbl>, width <dbl>

Evaluate

performance_h2o <- h2o.performance(best_model, newdata = test_h2o)
typeof(performance_h2o)
## [1] "S4"
slotNames(performance_h2o)
## [1] "algorithm" "on_train"  "on_valid"  "on_xval"   "metrics"
performance_h2o@metrics
## $model
## $model$`__meta`
## $model$`__meta`$schema_version
## [1] 3
## 
## $model$`__meta`$schema_name
## [1] "ModelKeyV3"
## 
## $model$`__meta`$schema_type
## [1] "Key<Model>"
## 
## 
## $model$name
## [1] "GBM_4_AutoML_16_20260429_111732"
## 
## $model$type
## [1] "Key<Model>"
## 
## $model$URL
## [1] "/3/Models/GBM_4_AutoML_16_20260429_111732"
## 
## 
## $model_checksum
## [1] "-4146815276140222047"
## 
## $frame
## $frame$name
## [1] "test_tbl_sid_82c8_3"
## 
## 
## $frame_checksum
## [1] "-1254967299183438941"
## 
## $description
## NULL
## 
## $scoring_time
## [1] 1.777476e+12
## 
## $predictions
## NULL
## 
## $MSE
## [1] 0.006312253
## 
## $RMSE
## [1] 0.07944969
## 
## $nobs
## [1] 475
## 
## $custom_metric_name
## NULL
## 
## $custom_metric_value
## [1] 0
## 
## $r2
## [1] -0.005792482
## 
## $logloss
## [1] 0.04432942
## 
## $AUC
## [1] 0.9812853
## 
## $pr_auc
## [1] 0.9998785
## 
## $Gini
## [1] 0.9625706
## 
## $mean_per_class_error
## [1] 0.5
## 
## $domain
## [1] "FALSE" "TRUE" 
## 
## $cm
## $cm$`__meta`
## $cm$`__meta`$schema_version
## [1] 3
## 
## $cm$`__meta`$schema_name
## [1] "ConfusionMatrixV3"
## 
## $cm$`__meta`$schema_type
## [1] "ConfusionMatrix"
## 
## 
## $cm$table
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##        FALSE TRUE  Error      Rate
## FALSE      0    3 1.0000 =   3 / 3
## TRUE       0  472 0.0000 = 0 / 472
## Totals     0  475 0.0063 = 3 / 475
## 
## 
## $thresholds_and_metric_scores
## Metrics for Thresholds: Binomial metrics as a function of classification thresholds
##   threshold       f1       f2 f0point5 accuracy precision   recall specificity
## 1  0.999843 0.004228 0.002647 0.010504 0.008421  1.000000 0.002119    1.000000
## 2  0.999828 0.008439 0.005291 0.020833 0.010526  1.000000 0.004237    1.000000
## 3  0.999814 0.012632 0.007932 0.030992 0.012632  1.000000 0.006356    1.000000
## 4  0.999809 0.016807 0.010571 0.040984 0.014737  1.000000 0.008475    1.000000
## 5  0.999809 0.020964 0.013207 0.050813 0.016842  1.000000 0.010593    1.000000
##   absolute_mcc min_per_class_accuracy mean_per_class_accuracy tns fns fps tps
## 1     0.003662               0.002119                0.501059   3 471   0   1
## 2     0.005184               0.004237                0.502119   3 470   0   2
## 3     0.006356               0.006356                0.503178   3 469   0   3
## 4     0.007347               0.008475                0.504237   3 468   0   4
## 5     0.008223               0.010593                0.505297   3 467   0   5
##        tnr      fnr      fpr      tpr idx
## 1 1.000000 0.997881 0.000000 0.002119   0
## 2 1.000000 0.995763 0.000000 0.004237   1
## 3 1.000000 0.993644 0.000000 0.006356   2
## 4 1.000000 0.991525 0.000000 0.008475   3
## 5 1.000000 0.989407 0.000000 0.010593   4
## 
## ---
##     threshold       f1       f2 f0point5 accuracy precision   recall
## 207  0.999362 0.994687 0.992787 0.996593 0.989474  0.997868 0.991525
## 208  0.999358 0.995754 0.994487 0.997024 0.991579  0.997872 0.993644
## 209  0.999349 0.996819 0.996185 0.997453 0.993684  0.997877 0.995763
## 210  0.998378 0.996825 0.997459 0.996193 0.993684  0.995772 0.997881
## 211  0.998073 0.995772 0.997036 0.994510 0.991579  0.993671 0.997881
## 212  0.922942 0.996832 0.998730 0.994941 0.993684  0.993684 1.000000
##     specificity absolute_mcc min_per_class_accuracy mean_per_class_accuracy tns
## 207    0.666667     0.466898               0.666667                0.829096   2
## 208    0.666667     0.512562               0.666667                0.830155   2
## 209    0.666667     0.574289               0.666667                0.831215   2
## 210    0.333333     0.405224               0.333333                0.665607   1
## 211    0.000000     0.003662               0.000000                0.498941   0
## 212    0.000000     0.000000               0.000000                0.500000   0
##     fns fps tps      tnr      fnr      fpr      tpr idx
## 207   4   1 468 0.666667 0.008475 0.333333 0.991525 206
## 208   3   1 469 0.666667 0.006356 0.333333 0.993644 207
## 209   2   1 470 0.666667 0.004237 0.333333 0.995763 208
## 210   1   2 471 0.333333 0.002119 0.666667 0.997881 209
## 211   1   3 471 0.000000 0.002119 1.000000 0.997881 210
## 212   0   3 472 0.000000 0.000000 1.000000 1.000000 211
## 
## $max_criteria_and_metric_scores
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold      value idx
## 1                       max f1  0.922942   0.996832 211
## 2                       max f2  0.922942   0.998730 211
## 3                 max f0point5  0.999349   0.997453 208
## 4                 max accuracy  0.999349   0.993684 208
## 5                max precision  0.999843   1.000000   0
## 6                   max recall  0.922942   1.000000 211
## 7              max specificity  0.999843   1.000000   0
## 8             max absolute_mcc  0.999349   0.574289 208
## 9   max min_per_class_accuracy  0.999722   0.949153 186
## 10 max mean_per_class_accuracy  0.999722   0.974576 186
## 11                     max tns  0.999843   3.000000   0
## 12                     max fns  0.999843 471.000000   0
## 13                     max fps  0.998073   3.000000 210
## 14                     max tps  0.922942 472.000000 211
## 15                     max tnr  0.999843   1.000000   0
## 16                     max fnr  0.999843   0.997881   0
## 17                     max fpr  0.998073   1.000000 210
## 18                     max tpr  0.922942   1.000000 211
## 
## $gains_lift_table
## Gains/Lift Table: Avg response rate: 99.37 %, avg score: 99.96 %
##    group cumulative_data_fraction lower_threshold     lift cumulative_lift
## 1      1               0.01052632        0.999801 1.006356        1.006356
## 2      2               0.02315789        0.999778 1.006356        1.006356
## 3      3               0.03157895        0.999776 1.006356        1.006356
## 4      4               0.04000000        0.999775 1.006356        1.006356
## 5      5               0.05263158        0.999775 1.006356        1.006356
## 6      6               0.10105263        0.999775 1.006356        1.006356
## 7      7               0.15368421        0.999775 1.006356        1.006356
## 8      8               0.20000000        0.999775 1.006356        1.006356
## 9      9               0.30315789        0.999775 1.006356        1.006356
## 10    10               0.69894737        0.999775 1.006356        1.006356
## 11    11               0.69894737        0.999775 0.000000        1.006356
## 12    12               0.80000000        0.999775 1.006356        1.006356
## 13    13               0.89894737        0.999727 1.006356        1.006356
## 14    14               1.00000000        0.922942 0.943459        1.000000
##    response_rate    score cumulative_response_rate cumulative_score
## 1       1.000000 0.999820                 1.000000         0.999820
## 2       1.000000 0.999788                 1.000000         0.999803
## 3       1.000000 0.999776                 1.000000         0.999796
## 4       1.000000 0.999776                 1.000000         0.999791
## 5       1.000000 0.999775                 1.000000         0.999788
## 6       1.000000 0.999775                 1.000000         0.999782
## 7       1.000000 0.999775                 1.000000         0.999780
## 8       1.000000 0.999775                 1.000000         0.999779
## 9       1.000000 0.999775                 1.000000         0.999778
## 10      1.000000 0.999775                 1.000000         0.999776
## 11      0.000000 0.000000                 1.000000         0.999776
## 12      1.000000 0.999775                 1.000000         0.999776
## 13      1.000000 0.999759                 1.000000         0.999774
## 14      0.937500 0.997991                 0.993684         0.999594
##    capture_rate cumulative_capture_rate        gain cumulative_gain
## 1      0.010593                0.010593    0.635593        0.635593
## 2      0.012712                0.023305    0.635593        0.635593
## 3      0.008475                0.031780    0.635593        0.635593
## 4      0.008475                0.040254    0.635593        0.635593
## 5      0.012712                0.052966    0.635593        0.635593
## 6      0.048729                0.101695    0.635593        0.635593
## 7      0.052966                0.154661    0.635593        0.635593
## 8      0.046610                0.201271    0.635593        0.635593
## 9      0.103814                0.305085    0.635593        0.635593
## 10     0.398305                0.703390    0.635593        0.635593
## 11     0.000000                0.703390 -100.000000        0.635593
## 12     0.101695                0.805085    0.635593        0.635593
## 13     0.099576                0.904661    0.635593        0.635593
## 14     0.095339                1.000000   -5.654131        0.000000
##    kolmogorov_smirnov
## 1            0.010593
## 2            0.023305
## 3            0.031780
## 4            0.040254
## 5            0.052966
## 6            0.101695
## 7            0.154661
## 8            0.201271
## 9            0.305085
## 10           0.703390
## 11           0.703390
## 12           0.805085
## 13           0.904661
## 14           0.000000
h2o.auc(performance_h2o)
## [1] 0.9812853
h2o.accuracy(performance_h2o)
##   threshold    accuracy
## 1 0.9998428 0.008421053
## 2 0.9998275 0.010526316
## 3 0.9998137 0.012631579
## 4 0.9998093 0.014736842
## 5 0.9998093 0.016842105
## 
## ---
##     threshold  accuracy
## 207 0.9993616 0.9894737
## 208 0.9993584 0.9915789
## 209 0.9993487 0.9936842
## 210 0.9983778 0.9936842
## 211 0.9980734 0.9915789
## 212 0.9229421 0.9936842
h2o.confusionMatrix(performance_h2o)
## Confusion Matrix (vertical: actual; across: predicted)  for max f1 @ threshold = 0.922942127589153:
##        FALSE TRUE    Error    Rate
## FALSE      0    3 1.000000    =3/3
## TRUE       0  472 0.000000  =0/472
## Totals     0  475 0.006316  =3/475
h2o.metric(performance_h2o)
## Metrics for Thresholds: Binomial metrics as a function of classification thresholds
##   threshold       f1       f2 f0point5 accuracy precision   recall specificity
## 1  0.999843 0.004228 0.002647 0.010504 0.008421  1.000000 0.002119    1.000000
## 2  0.999828 0.008439 0.005291 0.020833 0.010526  1.000000 0.004237    1.000000
## 3  0.999814 0.012632 0.007932 0.030992 0.012632  1.000000 0.006356    1.000000
## 4  0.999809 0.016807 0.010571 0.040984 0.014737  1.000000 0.008475    1.000000
## 5  0.999809 0.020964 0.013207 0.050813 0.016842  1.000000 0.010593    1.000000
##   absolute_mcc min_per_class_accuracy mean_per_class_accuracy tns fns fps tps
## 1     0.003662               0.002119                0.501059   3 471   0   1
## 2     0.005184               0.004237                0.502119   3 470   0   2
## 3     0.006356               0.006356                0.503178   3 469   0   3
## 4     0.007347               0.008475                0.504237   3 468   0   4
## 5     0.008223               0.010593                0.505297   3 467   0   5
##        tnr      fnr      fpr      tpr idx
## 1 1.000000 0.997881 0.000000 0.002119   0
## 2 1.000000 0.995763 0.000000 0.004237   1
## 3 1.000000 0.993644 0.000000 0.006356   2
## 4 1.000000 0.991525 0.000000 0.008475   3
## 5 1.000000 0.989407 0.000000 0.010593   4
## 
## ---
##     threshold       f1       f2 f0point5 accuracy precision   recall
## 207  0.999362 0.994687 0.992787 0.996593 0.989474  0.997868 0.991525
## 208  0.999358 0.995754 0.994487 0.997024 0.991579  0.997872 0.993644
## 209  0.999349 0.996819 0.996185 0.997453 0.993684  0.997877 0.995763
## 210  0.998378 0.996825 0.997459 0.996193 0.993684  0.995772 0.997881
## 211  0.998073 0.995772 0.997036 0.994510 0.991579  0.993671 0.997881
## 212  0.922942 0.996832 0.998730 0.994941 0.993684  0.993684 1.000000
##     specificity absolute_mcc min_per_class_accuracy mean_per_class_accuracy tns
## 207    0.666667     0.466898               0.666667                0.829096   2
## 208    0.666667     0.512562               0.666667                0.830155   2
## 209    0.666667     0.574289               0.666667                0.831215   2
## 210    0.333333     0.405224               0.333333                0.665607   1
## 211    0.000000     0.003662               0.000000                0.498941   0
## 212    0.000000     0.000000               0.000000                0.500000   0
##     fns fps tps      tnr      fnr      fpr      tpr idx
## 207   4   1 468 0.666667 0.008475 0.333333 0.991525 206
## 208   3   1 469 0.666667 0.006356 0.333333 0.993644 207
## 209   2   1 470 0.666667 0.004237 0.333333 0.995763 208
## 210   1   2 471 0.333333 0.002119 0.666667 0.997881 209
## 211   1   3 471 0.000000 0.002119 1.000000 0.997881 210
## 212   0   3 472 0.000000 0.000000 1.000000 1.000000 211

Conclusion

The previous model using XGBoost had an accuracy of 0.996 and an AUC of 0.968. The H2O AutoML model resulted in an accuracy of about 0.989 and an improved AUC of 0.981 .

Even though the accuracy slightly decreased, the increase in AUC shows that the H2O model does a better job overall at distinguishing between classes. This is important because AUC is a stronger measure of model performance, especially for classification problems.

Another key takeaway is that H2O AutoML was able to automatically build and tune multiple models in a much shorter amount of time compared to manually tuning XGBoost. This makes it a more efficient approach while still producing strong results.

Overall, H2O provided a better balance between performance and efficiency, making it the stronger model for this problem.