R-h20

#I already downloaded the pacakage 
library(h2o)
h2o.init()


H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /var/folders/g8/k40fgv1n08g8cmq06dchpspw0000gn/T//RtmptxzRKR/file3dcc76f7b3/h2o_gretacapelletti_started_from_r.out
    /var/folders/g8/k40fgv1n08g8cmq06dchpspw0000gn/T//RtmptxzRKR/file3dc56800f29/h2o_gretacapelletti_started_from_r.err

java version "1.8.0_441"
Java(TM) SE Runtime Environment (build 1.8.0_441-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.441-b07, mixed mode)


Starting H2O JVM and connecting: ..... Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         4 seconds 805 milliseconds 
    H2O cluster timezone:       America/New_York 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.44.0.3 
    H2O cluster version age:    1 year, 1 month and 14 days 
    H2O cluster name:           H2O_started_from_R_gretacapelletti_ezy703 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.77 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 4.4.2 (2024-10-31)

Warning: 
Your H2O cluster version is (1 year, 1 month and 14 days) old. There may be a newer version available.
Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html

??h2o

# Initialize the H2O instance
# Display connection details for the H2O cluster
h2o.init(nthreads = -1)

 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         6 seconds 985 milliseconds 
    H2O cluster timezone:       America/New_York 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.44.0.3 
    H2O cluster version age:    1 year, 1 month and 14 days 
    H2O cluster name:           H2O_started_from_R_gretacapelletti_ezy703 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.77 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 4.4.2 (2024-10-31)

Warning: 
Your H2O cluster version is (1 year, 1 month and 14 days) old. There may be a newer version available.
Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html

# Define the URL for the dataset
datasets <- "https://raw.githubusercontent.com/DarrenCook/h2o/bk/datasets/"
# Import the dataset from the URL
data <- h2o.importFile(paste0(datasets, "iris_wheader.csv"))


  |                                                                                         
  |                                                                                   |   0%
  |                                                                                         
  |===================================================================================| 100%

# Specify the target variable (the column to predict)
y <- "class"

# Predictor variables (all columns except the target)
x <- setdiff(names(data), y)

# Split the data into training (80%) and test (20%) sets
parts <- h2o.splitFrame(data, ratios = 0.8)
train <- parts[[1]]
test <- parts[[2]]
# Train a deep learning model
m <- h2o.deeplearning(x = x, y = y, training_frame = train)


  |                                                                                         
  |                                                                                   |   0%
  |                                                                                         
  |===================================================================================| 100%

# Make predictions on the test set
p <- h2o.predict(m, test)


  |                                                                                         
  |                                                                                   |   0%
  |                                                                                         
  |===================================================================================| 100%

#Now check the model's performance using metrics like mean squared error (MSE) and a confusion matrix
# Mean squared error
h2o.mse(m)

[1] 0.09422929

# Confusion matrix
h2o.confusionMatrix(m)

Confusion Matrix: Row labels: Actual class; Column labels: Predicted class

# WE NOTICED THAT THIS MODEL STRUGGLES PREDICTING IRIS-VERSICOLOR

# Convert predictions to a data frame
as.data.frame(h2o.cbind(p$predict,test$class))

NA

# Calculate the accuracy of the model by comparing predicted and actual classes
mean(p$predict == test$class)

[1] 0.7567568

# Evaluate the model's performance on the test set and display metrics
h2o.performance(m, test)

H2OMultinomialMetrics: deeplearning

Test Set Metrics: 
=====================

MSE: (Extract with `h2o.mse`) 0.2239617
RMSE: (Extract with `h2o.rmse`) 0.4732459
Logloss: (Extract with `h2o.logloss`) 0.9359359
Mean Per-Class Error: 0.1875
AUC: (Extract with `h2o.auc`) NaN
AUCPR: (Extract with `h2o.aucpr`) NaN
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>, <data>)`)
=========================================================================
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class


Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>, <data>)`
=======================================================================
Top-3 Hit Ratios:

NANANA

#The output provides important insights into the model's performance. The MSE (Mean Squared Error) indicates the average squared difference between the predicted and actual values; a lower value is better.
#The confusion matrix shows how well the model classified each class, providing detailed accuracy for each category. 
#The accuracy metric (e.g., 90.48%) represents the percentage of correctly classified instances.
#The model demonstrates strong performance in predicting the Iris dataset classes.

LS0tCnRpdGxlOiAiUi1oMjAiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KYGBge3J9CiNJIGFscmVhZHkgZG93bmxvYWRlZCB0aGUgcGFjYWthZ2UgCmxpYnJhcnkoaDJvKQpoMm8uaW5pdCgpCmBgYApgYGB7cn0KPz9oMm8KYGBgCmBgYHtyfQojIEluaXRpYWxpemUgdGhlIEgyTyBpbnN0YW5jZQojIERpc3BsYXkgY29ubmVjdGlvbiBkZXRhaWxzIGZvciB0aGUgSDJPIGNsdXN0ZXIKaDJvLmluaXQobnRocmVhZHMgPSAtMSkKYGBgCmBgYHtyfQojIERlZmluZSB0aGUgVVJMIGZvciB0aGUgZGF0YXNldApkYXRhc2V0cyA8LSAiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL0RhcnJlbkNvb2svaDJvL2JrL2RhdGFzZXRzLyIKIyBJbXBvcnQgdGhlIGRhdGFzZXQgZnJvbSB0aGUgVVJMCmRhdGEgPC0gaDJvLmltcG9ydEZpbGUocGFzdGUwKGRhdGFzZXRzLCAiaXJpc193aGVhZGVyLmNzdiIpKQpgYGAKYGBge3J9CiMgU3BlY2lmeSB0aGUgdGFyZ2V0IHZhcmlhYmxlICh0aGUgY29sdW1uIHRvIHByZWRpY3QpCnkgPC0gImNsYXNzIgoKIyBQcmVkaWN0b3IgdmFyaWFibGVzIChhbGwgY29sdW1ucyBleGNlcHQgdGhlIHRhcmdldCkKeCA8LSBzZXRkaWZmKG5hbWVzKGRhdGEpLCB5KQoKIyBTcGxpdCB0aGUgZGF0YSBpbnRvIHRyYWluaW5nICg4MCUpIGFuZCB0ZXN0ICgyMCUpIHNldHMKcGFydHMgPC0gaDJvLnNwbGl0RnJhbWUoZGF0YSwgcmF0aW9zID0gMC44KQp0cmFpbiA8LSBwYXJ0c1tbMV1dCnRlc3QgPC0gcGFydHNbWzJdXQojIFRyYWluIGEgZGVlcCBsZWFybmluZyBtb2RlbAptIDwtIGgyby5kZWVwbGVhcm5pbmcoeCA9IHgsIHkgPSB5LCB0cmFpbmluZ19mcmFtZSA9IHRyYWluKQpgYGAKYGBge3J9CiMgTWFrZSBwcmVkaWN0aW9ucyBvbiB0aGUgdGVzdCBzZXQKcCA8LSBoMm8ucHJlZGljdChtLCB0ZXN0KQpgYGAKYGBge3J9CiNOb3cgY2hlY2sgdGhlIG1vZGVsJ3MgcGVyZm9ybWFuY2UgdXNpbmcgbWV0cmljcyBsaWtlIG1lYW4gc3F1YXJlZCBlcnJvciAoTVNFKSBhbmQgYSBjb25mdXNpb24gbWF0cml4CiMgTWVhbiBzcXVhcmVkIGVycm9yCmgyby5tc2UobSkKCgojIENvbmZ1c2lvbiBtYXRyaXgKaDJvLmNvbmZ1c2lvbk1hdHJpeChtKQojIFdFIE5PVElDRUQgVEhBVCBUSElTIE1PREVMIFNUUlVHR0xFUyBQUkVESUNUSU5HIElSSVMtVkVSU0lDT0xPUgpgYGAKYGBge3J9CiMgQ29udmVydCBwcmVkaWN0aW9ucyB0byBhIGRhdGEgZnJhbWUKYXMuZGF0YS5mcmFtZShoMm8uY2JpbmQocCRwcmVkaWN0LHRlc3QkY2xhc3MpKQojdGhlcmUgYXJlIG5vIGlzc3VlcyB3aXRoIGlyaXMtdmlyZ2luaWNhCmBgYAoKYGBge3J9CiMgQ2FsY3VsYXRlIHRoZSBhY2N1cmFjeSBvZiB0aGUgbW9kZWwgYnkgY29tcGFyaW5nIHByZWRpY3RlZCBhbmQgYWN0dWFsIGNsYXNzZXMKbWVhbihwJHByZWRpY3QgPT0gdGVzdCRjbGFzcykKYGBgCmBgYHtyfQojIEV2YWx1YXRlIHRoZSBtb2RlbCdzIHBlcmZvcm1hbmNlIG9uIHRoZSB0ZXN0IHNldCBhbmQgZGlzcGxheSBtZXRyaWNzCmgyby5wZXJmb3JtYW5jZShtLCB0ZXN0KQoKYGBgCmBgYHtyfQojVGhlIG91dHB1dCBwcm92aWRlcyBpbXBvcnRhbnQgaW5zaWdodHMgaW50byB0aGUgbW9kZWwncyBwZXJmb3JtYW5jZS4gVGhlIE1TRSAoTWVhbiBTcXVhcmVkIEVycm9yKSBpbmRpY2F0ZXMgdGhlIGF2ZXJhZ2Ugc3F1YXJlZCBkaWZmZXJlbmNlIGJldHdlZW4gdGhlIHByZWRpY3RlZCBhbmQgYWN0dWFsIHZhbHVlczsgYSBsb3dlciB2YWx1ZSBpcyBiZXR0ZXIuCiNUaGUgY29uZnVzaW9uIG1hdHJpeCBzaG93cyBob3cgd2VsbCB0aGUgbW9kZWwgY2xhc3NpZmllZCBlYWNoIGNsYXNzLCBwcm92aWRpbmcgZGV0YWlsZWQgYWNjdXJhY3kgZm9yIGVhY2ggY2F0ZWdvcnkuIAojVGhlIGFjY3VyYWN5IG1ldHJpYyAoZS5nLiwgOTAuNDglKSByZXByZXNlbnRzIHRoZSBwZXJjZW50YWdlIG9mIGNvcnJlY3RseSBjbGFzc2lmaWVkIGluc3RhbmNlcy4KI1RoZSBtb2RlbCBkZW1vbnN0cmF0ZXMgc3Ryb25nIHBlcmZvcm1hbmNlIGluIHByZWRpY3RpbmcgdGhlIElyaXMgZGF0YXNldCBjbGFzc2VzLgpgYGAKCg==