This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
I will be writing a simple code below to install h2o package, hosting the h2o server, performing analysis using machine learning algorithm and finally creating a high accuracy model.
Install h2o package by running the command install.packages(“h2o”) and follow up with code below. Initiating H2O with all cores in use and max memory of 4 GB allocated from the system - Getting an error as memory is exceeding and H2O works better in 64-bit JAVA.
You can access the server hosted on the localhost. By hitting the URL, http://localhost:54321/flow/index.html and all the below operations can be carried on.
H2O Cloudframe is open source and provides great results with reliable machine learning algorithms inbuilt in the package.
Invoking the library and running the h2o server.
library(h2o)
h2o.init(nthreads = -1)
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
C:\Users\SHARAT~1\AppData\Local\Temp\RtmpsvFGAF/h2o_Sharathchandra_B_M_started_from_r.out
C:\Users\SHARAT~1\AppData\Local\Temp\RtmpsvFGAF/h2o_Sharathchandra_B_M_started_from_r.err
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
Starting H2O JVM and connecting: . Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 7 seconds 280 milliseconds
H2O cluster timezone: America/New_York
H2O data parsing timezone: UTC
H2O cluster version: 3.20.0.2
H2O cluster version age: 15 days
H2O cluster name: H2O_started_from_R_Sharathchandra_B_M_era184
H2O cluster total nodes: 1
H2O cluster total memory: 0.87 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: Algos, AutoML, Core V3, Core V4
R Version: R version 3.4.1 (2017-06-30)
Reading a data spreadsheet to H2O Cloud Reading from URL or from local. Thus, I have imported data from local.
dataset <- h2o.importFile("C:/Users/Sharathchandra B M/Desktop/iris_wheader.csv",destination_frame = "tryout")
|
| | 0%
|
|========================================================================================| 100%
Predicting values for column - “class”, thus assigning that column
y <- "class"
x <- setdiff(names(dataset), y)
Splitting of dataset. To create test dataset and train dataset. Train dataset has 80% of data and test dataset has 20% of data.
parts <- h2o.splitFrame(dataset,0.8)
train <- parts[[1]]
test <- parts[[2]]
Fitted Model created based on values got from train dataset.
model <- h2o.deeplearning(x,y,train,model_id = "ModelusingDeepLearning")
|
| | 0%
|
|=============================================================================== | 90%
|
|========================================================================================| 100%
summary(model)
Model Details:
==============
H2OMultinomialModel: deeplearning
Model Key: ModelusingDeepLearning
Status of Neuron Layers: predicting class, 3-class classification, multinomial distribution, CrossEntropy loss, 41,803 weights/biases, 498.1 KB, 1,120 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum mean_weight
1 1 4 Input 0.00 %
2 2 200 Rectifier 0.00 % 0.000000 0.000000 0.004459 0.003628 0.000000 0.000811
3 3 200 Rectifier 0.00 % 0.000000 0.000000 0.010270 0.012065 0.000000 -0.001171
4 4 3 Softmax 0.000000 0.000000 0.003407 0.006725 0.000000 0.014556
weight_rms mean_bias bias_rms
1
2 0.107029 0.487214 0.009407
3 0.070107 0.998823 0.004643
4 0.395332 0.000225 0.000875
H2OMultinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
Training Set Metrics:
=====================
Extract training frame with `h2o.getFrame("RTMP_sid_ba39_1")`
MSE: (Extract with `h2o.mse`) 0.1321503
RMSE: (Extract with `h2o.rmse`) 0.3635248
Logloss: (Extract with `h2o.logloss`) 0.564181
Mean Per-Class Error: 0.1382114
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
Iris-setosa Iris-versicolor Iris-virginica Error Rate
Iris-setosa 37 0 0 0.0000 = 0 / 37
Iris-versicolor 0 34 0 0.0000 = 0 / 34
Iris-virginica 0 17 24 0.4146 = 17 / 41
Totals 37 51 24 0.1518 = 17 / 112
Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-3 Hit Ratios:
k hit_ratio
1 1 0.848214
2 2 1.000000
3 3 1.000000
Scoring History:
timestamp duration training_speed epochs iterations samples training_rmse
1 2018-07-01 16:16:54 0.000 sec 0.00000 0 0.000000
2 2018-07-01 16:16:56 1.966 sec 474 obs/sec 1.00000 1 112.000000 0.60107
3 2018-07-01 16:16:57 2.443 sec 1620 obs/sec 10.00000 10 1120.000000 0.36352
training_logloss training_r2 training_classification_error
1
2 2.20260 0.48029 0.36607
3 0.56418 0.80990 0.15179
Variable Importances: (Extract with `h2o.varimp`)
=================================================
Variable Importances:
variable relative_importance scaled_importance percentage
1 petal_wid 1.000000 1.000000 0.269435
2 petal_len 0.926808 0.926808 0.249714
3 sepal_len 0.910143 0.910143 0.245224
4 sepal_wid 0.874524 0.874524 0.235627
Predict values and check alongside test data (Testing the model). Confusion Matrix has been displayed.
predict <- h2o.predict(model,test)
|
| | 0%
|
|========================================================================================| 100%
h2o.confusionMatrix(model)
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
Iris-setosa Iris-versicolor Iris-virginica Error Rate
Iris-setosa 37 0 0 0.0000 = 0 / 37
Iris-versicolor 0 34 0 0.0000 = 0 / 34
Iris-virginica 0 17 24 0.4146 = 17 / 41
Totals 37 51 24 0.1518 = 17 / 112
Creating a dataframe of predicted values and actual values
display <- as.data.frame(h2o.cbind(predict$predict,test$class))
display
Accuracy of the model
mean(predict$predict==test$class)
[1] 0.9736842
Shutdown the h2o server instance.
h2o.shutdown()
Y
[1] TRUE