Setup

Start R, and type install.packages("h2o") if needed. Load the package and start H2O:

# install.packages("h2o")
library(h2o)
h2o.init()
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         32 seconds 944 milliseconds 
##     H2O cluster timezone:       UTC 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.44.0.3 
##     H2O cluster version age:    2 years, 1 month and 13 days 
##     H2O cluster name:           H2O_started_from_R_r3583878_znd758 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   0.17 GB 
##     H2O cluster total cores:    1 
##     H2O cluster allowed cores:  1 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 4.5.2 (2025-10-31)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is (2 years, 1 month and 13 days) old. There may be a newer version available.
## Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html

By default, h2o.init() uses only two cores. To use all available cores and more memory:

library(h2o)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         33 seconds 14 milliseconds 
##     H2O cluster timezone:       UTC 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.44.0.3 
##     H2O cluster version age:    2 years, 1 month and 13 days 
##     H2O cluster name:           H2O_started_from_R_r3583878_znd758 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   0.17 GB 
##     H2O cluster total cores:    1 
##     H2O cluster allowed cores:  1 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 4.5.2 (2025-10-31)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is (2 years, 1 month and 13 days) old. There may be a newer version available.
## Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html

Workflow Overview

The workflow has five steps: 1–3 prepare the data, 4 trains the model, and 5 uses that model.

1. Get data into the H2O cluster

First major concept: all the data lives on the cluster (the server), not on the client—even when client and cluster are the same machine. So whenever we want to train a model or make a prediction, we have to get the data into the H2O cluster.

datasets <- "https://raw.githubusercontent.com/DarrenCook/h2o/bk/datasets/"
data <- h2o.importFile(paste0(datasets, "iris_wheader.csv"))
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

This creates a frame on the cluster (e.g. iris_wheader.hex). H2O infers that the class column is categorical, so we will do multinomial classification, not regression.

2. Define target and features

Define y (what we want to predict) and x (what we learn from): use the four measurements to predict species.

y <- "class"
x <- setdiff(names(data), y)

3. Split into train and test

We randomly use 80% for training and 20% for testing to assess generalization. In production, the test set represents new flowers we want to classify.

parts <- h2o.splitFrame(data, 0.8)
train <- parts[[1]]
test <- parts[[2]]

h2o.splitFrame() returns a list; we assign the two parts to train and test for clarity.

4. Train the model

In R, training is a single function call: we pass the feature names, target name, and training data.

m <- h2o.deeplearning(x, y, train)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

5. Predict on the test set

h2o.predict() returns a handle to a frame on the H2O server. Printing p shows the first few rows.

p <- h2o.predict(m, test)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Model evaluation

Training metrics

h2o.mse(m)
## [1] 0.2043511
h2o.confusionMatrix(m)
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##                 Iris-setosa Iris-versicolor Iris-virginica  Error       Rate
## Iris-setosa              38               0              0 0.0000 =   0 / 38
## Iris-versicolor           0              35              0 0.0000 =   0 / 35
## Iris-virginica            0              29             10 0.7436 =  29 / 39
## Totals                   38              64             10 0.2589 = 29 / 112

Predictions in R

The first column is the predicted class; the other three are class probabilities (confidence).

as.data.frame(p)
##            predict  Iris.setosa Iris.versicolor Iris.virginica
## 1      Iris-setosa 9.984268e-01    0.0015732402   1.761863e-12
## 2      Iris-setosa 9.989979e-01    0.0010020914   6.544014e-13
## 3      Iris-setosa 9.989884e-01    0.0010115816   6.626523e-13
## 4      Iris-setosa 9.998739e-01    0.0001261053   1.124448e-13
## 5      Iris-setosa 9.998586e-01    0.0001414487   2.640421e-13
## 6      Iris-setosa 9.996866e-01    0.0003134027   4.868249e-13
## 7      Iris-setosa 9.998372e-01    0.0001628131   6.146253e-13
## 8      Iris-setosa 9.993633e-01    0.0006366508   4.228722e-13
## 9      Iris-setosa 9.979659e-01    0.0020341248   4.791745e-12
## 10     Iris-setosa 9.989979e-01    0.0010020914   6.544014e-13
## 11     Iris-setosa 9.995746e-01    0.0004253715   8.550194e-12
## 12     Iris-setosa 9.994883e-01    0.0005117304   4.695759e-13
## 13 Iris-versicolor 3.657809e-04    0.9993723222   2.618969e-04
## 14 Iris-versicolor 4.264699e-03    0.9949285637   8.067370e-04
## 15 Iris-versicolor 9.883547e-03    0.9895293930   5.870602e-04
## 16 Iris-versicolor 6.438549e-03    0.9909389639   2.622487e-03
## 17 Iris-versicolor 2.429952e-01    0.7569885610   1.622068e-05
## 18 Iris-versicolor 4.708610e-04    0.9993168766   2.122624e-04
## 19 Iris-versicolor 1.643375e-02    0.9826088475   9.574029e-04
## 20 Iris-versicolor 3.046221e-02    0.9671059356   2.431850e-03
## 21 Iris-versicolor 1.804881e-03    0.9979453873   2.497314e-04
## 22 Iris-versicolor 1.668673e-03    0.9981769003   1.544269e-04
## 23 Iris-versicolor 6.338004e-02    0.9336398610   2.980097e-03
## 24 Iris-versicolor 5.782210e-04    0.9985572130   8.645660e-04
## 25 Iris-versicolor 9.190633e-05    0.9995651762   3.429175e-04
## 26 Iris-versicolor 1.429167e-02    0.9853154888   3.928399e-04
## 27 Iris-versicolor 1.955184e-02    0.9801834669   2.646915e-04
## 28 Iris-versicolor 2.276464e-02    0.9430462145   3.418915e-02
## 29  Iris-virginica 1.416489e-04    0.3924121729   6.074462e-01
## 30 Iris-versicolor 2.161628e-03    0.7864718908   2.113665e-01
## 31 Iris-versicolor 1.134350e-05    0.9573929886   4.259567e-02
## 32 Iris-versicolor 1.984401e-05    0.5377778757   4.622023e-01
## 33 Iris-versicolor 1.564917e-05    0.9610914551   3.889290e-02
## 34 Iris-versicolor 2.540272e-03    0.9725826477   2.487708e-02
## 35  Iris-virginica 7.423446e-06    0.1923704711   8.076221e-01
## 36  Iris-virginica 9.353449e-06    0.1113822684   8.886084e-01
## 37 Iris-versicolor 1.389136e-04    0.8786868069   1.211743e-01
## 38  Iris-virginica 5.928886e-04    0.3335486606   6.658585e-01

Compare predictions to actuals

The true species is in test$class. We can bind predicted and actual to see which rows were wrong:

as.data.frame(h2o.cbind(p$predict, test$class))
##            predict           class
## 1      Iris-setosa     Iris-setosa
## 2      Iris-setosa     Iris-setosa
## 3      Iris-setosa     Iris-setosa
## 4      Iris-setosa     Iris-setosa
## 5      Iris-setosa     Iris-setosa
## 6      Iris-setosa     Iris-setosa
## 7      Iris-setosa     Iris-setosa
## 8      Iris-setosa     Iris-setosa
## 9      Iris-setosa     Iris-setosa
## 10     Iris-setosa     Iris-setosa
## 11     Iris-setosa     Iris-setosa
## 12     Iris-setosa     Iris-setosa
## 13 Iris-versicolor Iris-versicolor
## 14 Iris-versicolor Iris-versicolor
## 15 Iris-versicolor Iris-versicolor
## 16 Iris-versicolor Iris-versicolor
## 17 Iris-versicolor Iris-versicolor
## 18 Iris-versicolor Iris-versicolor
## 19 Iris-versicolor Iris-versicolor
## 20 Iris-versicolor Iris-versicolor
## 21 Iris-versicolor Iris-versicolor
## 22 Iris-versicolor Iris-versicolor
## 23 Iris-versicolor Iris-versicolor
## 24 Iris-versicolor Iris-versicolor
## 25 Iris-versicolor Iris-versicolor
## 26 Iris-versicolor Iris-versicolor
## 27 Iris-versicolor Iris-versicolor
## 28 Iris-versicolor  Iris-virginica
## 29  Iris-virginica  Iris-virginica
## 30 Iris-versicolor  Iris-virginica
## 31 Iris-versicolor  Iris-virginica
## 32 Iris-versicolor  Iris-virginica
## 33 Iris-versicolor  Iris-virginica
## 34 Iris-versicolor  Iris-virginica
## 35  Iris-virginica  Iris-virginica
## 36  Iris-virginica  Iris-virginica
## 37 Iris-versicolor  Iris-virginica
## 38  Iris-virginica  Iris-virginica

Accuracy

We can compute the proportion of test samples the model got right with mean(p$predict == test$class):

mean(p$predict == test$class)
## [1] 0.8157895

Test set performance

Alternatively, use h2o.performance() on the test set to get MSE, confusion matrix, and hit ratios:

h2o.performance(m, test)
## H2OMultinomialMetrics: deeplearning
## 
## Test Set Metrics: 
## =====================
## 
## MSE: (Extract with `h2o.mse`) 0.1523254
## RMSE: (Extract with `h2o.rmse`) 0.3902889
## Logloss: (Extract with `h2o.logloss`) 0.5163046
## Mean Per-Class Error: 0.2121212
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>, <data>)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##                 Iris-setosa Iris-versicolor Iris-virginica  Error     Rate
## Iris-setosa              12               0              0 0.0000 = 0 / 12
## Iris-versicolor           0              15              0 0.0000 = 0 / 15
## Iris-virginica            0               7              4 0.6364 = 7 / 11
## Totals                   12              22              4 0.1842 = 7 / 38
## 
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>, <data>)`
## =======================================================================
## Top-3 Hit Ratios: 
##   k hit_ratio
## 1 1  0.815789
## 2 2  1.000000
## 3 3  1.000000

The Hit Ratio Table shows the same accuracy (Hit Ratio @ 1). Hit Ratio @ 2 = 1 means 100% correct if we allow two guesses. The confusion matrix shows how many of each class were misclassified (e.g. versicolor predicted as virginica or vice versa).