Start R, and type install.packages("h2o") if needed.
Load the package and start H2O:
# install.packages("h2o")
library(h2o)
h2o.init()
## Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 32 seconds 944 milliseconds
## H2O cluster timezone: UTC
## H2O data parsing timezone: UTC
## H2O cluster version: 3.44.0.3
## H2O cluster version age: 2 years, 1 month and 13 days
## H2O cluster name: H2O_started_from_R_r3583878_znd758
## H2O cluster total nodes: 1
## H2O cluster total memory: 0.17 GB
## H2O cluster total cores: 1
## H2O cluster allowed cores: 1
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## R Version: R version 4.5.2 (2025-10-31)
## Warning in h2o.clusterInfo():
## Your H2O cluster version is (2 years, 1 month and 13 days) old. There may be a newer version available.
## Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html
By default, h2o.init() uses only two cores. To use all
available cores and more memory:
library(h2o)
h2o.init(nthreads = -1)
## Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 33 seconds 14 milliseconds
## H2O cluster timezone: UTC
## H2O data parsing timezone: UTC
## H2O cluster version: 3.44.0.3
## H2O cluster version age: 2 years, 1 month and 13 days
## H2O cluster name: H2O_started_from_R_r3583878_znd758
## H2O cluster total nodes: 1
## H2O cluster total memory: 0.17 GB
## H2O cluster total cores: 1
## H2O cluster allowed cores: 1
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## R Version: R version 4.5.2 (2025-10-31)
## Warning in h2o.clusterInfo():
## Your H2O cluster version is (2 years, 1 month and 13 days) old. There may be a newer version available.
## Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html
The workflow has five steps: 1–3 prepare the data, 4 trains the model, and 5 uses that model.
First major concept: all the data lives on the cluster (the server), not on the client—even when client and cluster are the same machine. So whenever we want to train a model or make a prediction, we have to get the data into the H2O cluster.
datasets <- "https://raw.githubusercontent.com/DarrenCook/h2o/bk/datasets/"
data <- h2o.importFile(paste0(datasets, "iris_wheader.csv"))
## | | | 0% | |======================================================================| 100%
This creates a frame on the cluster
(e.g. iris_wheader.hex). H2O infers that the
class column is categorical, so we will do
multinomial classification, not regression.
Define y (what we want to predict) and x (what we learn from): use the four measurements to predict species.
y <- "class"
x <- setdiff(names(data), y)
We randomly use 80% for training and 20% for testing to assess generalization. In production, the test set represents new flowers we want to classify.
parts <- h2o.splitFrame(data, 0.8)
train <- parts[[1]]
test <- parts[[2]]
h2o.splitFrame() returns a list; we assign the two parts
to train and test for clarity.
In R, training is a single function call: we pass the feature names, target name, and training data.
m <- h2o.deeplearning(x, y, train)
## | | | 0% | |======================================================================| 100%
h2o.predict() returns a handle to a frame on the H2O
server. Printing p shows the first few rows.
p <- h2o.predict(m, test)
## | | | 0% | |======================================================================| 100%
h2o.mse(m)
## [1] 0.2043511
h2o.confusionMatrix(m)
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
## Iris-setosa Iris-versicolor Iris-virginica Error Rate
## Iris-setosa 38 0 0 0.0000 = 0 / 38
## Iris-versicolor 0 35 0 0.0000 = 0 / 35
## Iris-virginica 0 29 10 0.7436 = 29 / 39
## Totals 38 64 10 0.2589 = 29 / 112
The first column is the predicted class; the other three are class probabilities (confidence).
as.data.frame(p)
## predict Iris.setosa Iris.versicolor Iris.virginica
## 1 Iris-setosa 9.984268e-01 0.0015732402 1.761863e-12
## 2 Iris-setosa 9.989979e-01 0.0010020914 6.544014e-13
## 3 Iris-setosa 9.989884e-01 0.0010115816 6.626523e-13
## 4 Iris-setosa 9.998739e-01 0.0001261053 1.124448e-13
## 5 Iris-setosa 9.998586e-01 0.0001414487 2.640421e-13
## 6 Iris-setosa 9.996866e-01 0.0003134027 4.868249e-13
## 7 Iris-setosa 9.998372e-01 0.0001628131 6.146253e-13
## 8 Iris-setosa 9.993633e-01 0.0006366508 4.228722e-13
## 9 Iris-setosa 9.979659e-01 0.0020341248 4.791745e-12
## 10 Iris-setosa 9.989979e-01 0.0010020914 6.544014e-13
## 11 Iris-setosa 9.995746e-01 0.0004253715 8.550194e-12
## 12 Iris-setosa 9.994883e-01 0.0005117304 4.695759e-13
## 13 Iris-versicolor 3.657809e-04 0.9993723222 2.618969e-04
## 14 Iris-versicolor 4.264699e-03 0.9949285637 8.067370e-04
## 15 Iris-versicolor 9.883547e-03 0.9895293930 5.870602e-04
## 16 Iris-versicolor 6.438549e-03 0.9909389639 2.622487e-03
## 17 Iris-versicolor 2.429952e-01 0.7569885610 1.622068e-05
## 18 Iris-versicolor 4.708610e-04 0.9993168766 2.122624e-04
## 19 Iris-versicolor 1.643375e-02 0.9826088475 9.574029e-04
## 20 Iris-versicolor 3.046221e-02 0.9671059356 2.431850e-03
## 21 Iris-versicolor 1.804881e-03 0.9979453873 2.497314e-04
## 22 Iris-versicolor 1.668673e-03 0.9981769003 1.544269e-04
## 23 Iris-versicolor 6.338004e-02 0.9336398610 2.980097e-03
## 24 Iris-versicolor 5.782210e-04 0.9985572130 8.645660e-04
## 25 Iris-versicolor 9.190633e-05 0.9995651762 3.429175e-04
## 26 Iris-versicolor 1.429167e-02 0.9853154888 3.928399e-04
## 27 Iris-versicolor 1.955184e-02 0.9801834669 2.646915e-04
## 28 Iris-versicolor 2.276464e-02 0.9430462145 3.418915e-02
## 29 Iris-virginica 1.416489e-04 0.3924121729 6.074462e-01
## 30 Iris-versicolor 2.161628e-03 0.7864718908 2.113665e-01
## 31 Iris-versicolor 1.134350e-05 0.9573929886 4.259567e-02
## 32 Iris-versicolor 1.984401e-05 0.5377778757 4.622023e-01
## 33 Iris-versicolor 1.564917e-05 0.9610914551 3.889290e-02
## 34 Iris-versicolor 2.540272e-03 0.9725826477 2.487708e-02
## 35 Iris-virginica 7.423446e-06 0.1923704711 8.076221e-01
## 36 Iris-virginica 9.353449e-06 0.1113822684 8.886084e-01
## 37 Iris-versicolor 1.389136e-04 0.8786868069 1.211743e-01
## 38 Iris-virginica 5.928886e-04 0.3335486606 6.658585e-01
The true species is in test$class. We can bind predicted
and actual to see which rows were wrong:
as.data.frame(h2o.cbind(p$predict, test$class))
## predict class
## 1 Iris-setosa Iris-setosa
## 2 Iris-setosa Iris-setosa
## 3 Iris-setosa Iris-setosa
## 4 Iris-setosa Iris-setosa
## 5 Iris-setosa Iris-setosa
## 6 Iris-setosa Iris-setosa
## 7 Iris-setosa Iris-setosa
## 8 Iris-setosa Iris-setosa
## 9 Iris-setosa Iris-setosa
## 10 Iris-setosa Iris-setosa
## 11 Iris-setosa Iris-setosa
## 12 Iris-setosa Iris-setosa
## 13 Iris-versicolor Iris-versicolor
## 14 Iris-versicolor Iris-versicolor
## 15 Iris-versicolor Iris-versicolor
## 16 Iris-versicolor Iris-versicolor
## 17 Iris-versicolor Iris-versicolor
## 18 Iris-versicolor Iris-versicolor
## 19 Iris-versicolor Iris-versicolor
## 20 Iris-versicolor Iris-versicolor
## 21 Iris-versicolor Iris-versicolor
## 22 Iris-versicolor Iris-versicolor
## 23 Iris-versicolor Iris-versicolor
## 24 Iris-versicolor Iris-versicolor
## 25 Iris-versicolor Iris-versicolor
## 26 Iris-versicolor Iris-versicolor
## 27 Iris-versicolor Iris-versicolor
## 28 Iris-versicolor Iris-virginica
## 29 Iris-virginica Iris-virginica
## 30 Iris-versicolor Iris-virginica
## 31 Iris-versicolor Iris-virginica
## 32 Iris-versicolor Iris-virginica
## 33 Iris-versicolor Iris-virginica
## 34 Iris-versicolor Iris-virginica
## 35 Iris-virginica Iris-virginica
## 36 Iris-virginica Iris-virginica
## 37 Iris-versicolor Iris-virginica
## 38 Iris-virginica Iris-virginica
We can compute the proportion of test samples the model got right
with mean(p$predict == test$class):
mean(p$predict == test$class)
## [1] 0.8157895
Alternatively, use h2o.performance() on the test set to
get MSE, confusion matrix, and hit ratios:
h2o.performance(m, test)
## H2OMultinomialMetrics: deeplearning
##
## Test Set Metrics:
## =====================
##
## MSE: (Extract with `h2o.mse`) 0.1523254
## RMSE: (Extract with `h2o.rmse`) 0.3902889
## Logloss: (Extract with `h2o.logloss`) 0.5163046
## Mean Per-Class Error: 0.2121212
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>, <data>)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
## Iris-setosa Iris-versicolor Iris-virginica Error Rate
## Iris-setosa 12 0 0 0.0000 = 0 / 12
## Iris-versicolor 0 15 0 0.0000 = 0 / 15
## Iris-virginica 0 7 4 0.6364 = 7 / 11
## Totals 12 22 4 0.1842 = 7 / 38
##
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>, <data>)`
## =======================================================================
## Top-3 Hit Ratios:
## k hit_ratio
## 1 1 0.815789
## 2 2 1.000000
## 3 3 1.000000
The Hit Ratio Table shows the same accuracy (Hit Ratio @ 1). Hit Ratio @ 2 = 1 means 100% correct if we allow two guesses. The confusion matrix shows how many of each class were misclassified (e.g. versicolor predicted as virginica or vice versa).