Step 1: Data collecting

To this end, we’ll use a dataset donated to the UCI Machine Learning Data Repository (http://archive.ics.uci.edu/ml) by W. Frey and D. J. Slate. The dataset contains 20,000 examples of 26 English alphabet capital letters as printed using 20 different randomly reshaped and distorted black and white fonts.

Step 2: Exploring and preparing the data

read in data and examine structure

letters <- read.csv("http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml11/letterdata.csv")
str(letters)

'data.frame':   20000 obs. of  17 variables:
 $ letter: Factor w/ 26 levels "A","B","C","D",..: 20 9 4 14 7 19 2 1 10 13 ...
 $ xbox  : int  2 5 4 7 2 4 4 1 2 11 ...
 $ ybox  : int  8 12 11 11 1 11 2 1 2 15 ...
 $ width : int  3 3 6 6 3 5 5 3 4 13 ...
 $ height: int  5 7 8 6 1 8 4 2 4 9 ...
 $ onpix : int  1 2 6 3 1 3 4 1 2 7 ...
 $ xbar  : int  8 10 10 5 8 8 8 8 10 13 ...
 $ ybar  : int  13 5 6 9 6 8 7 2 6 2 ...
 $ x2bar : int  0 5 2 4 6 6 6 2 2 6 ...
 $ y2bar : int  6 4 6 6 6 9 6 2 6 2 ...
 $ xybar : int  6 13 10 4 6 5 7 8 12 12 ...
 $ x2ybar: int  10 3 3 4 5 6 6 2 4 1 ...
 $ xy2bar: int  8 9 7 10 9 6 6 8 8 9 ...
 $ xedge : int  0 2 3 6 1 0 2 1 1 8 ...
 $ xedgey: int  8 8 7 10 7 8 8 6 6 1 ...
 $ yedge : int  0 4 3 2 5 9 7 2 1 1 ...
 $ yedgex: int  8 10 9 8 10 7 10 7 7 8 ...

divide into training and test data

letters_train <- letters[1:16000, ]
letters_test  <- letters[16001:20000, ]

Step 3: Training a model on the data

begin by training a simple linear SVM

library(kernlab)


Attaching package: ‘kernlab’

The following object is masked from ‘package:ggplot2’:

    alpha

The following object is masked from ‘package:psych’:

    alpha

letter_classifier <- ksvm(letter ~ ., data = letters_train,
                          kernel = "vanilladot")

 Setting default kernel parameters

look at basic information about the model

letter_classifier

Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 1 

Linear (vanilla) kernel function. 

Number of Support Vectors : 7037 

Objective Function Value : -14.1746 -20.0072 -23.5628 -6.2009 -7.5524 -32.7694 -49.9786 -18.1824 -62.1111 -32.7284 -16.2209 -32.2837 -28.9777 -51.2195 -13.276 -35.6217 -30.8612 -16.5256 -14.6811 -32.7475 -30.3219 -7.7956 -11.8138 -32.3463 -13.1262 -9.2692 -153.1654 -52.9678 -76.7744 -119.2067 -165.4437 -54.6237 -41.9809 -67.2688 -25.1959 -27.6371 -26.4102 -35.5583 -41.2597 -122.164 -187.9178 -222.0856 -21.4765 -10.3752 -56.3684 -12.2277 -49.4899 -9.3372 -19.2092 -11.1776 -100.2186 -29.1397 -238.0516 -77.1985 -8.3339 -4.5308 -139.8534 -80.8854 -20.3642 -13.0245 -82.5151 -14.5032 -26.7509 -18.5713 -23.9511 -27.3034 -53.2731 -11.4773 -5.12 -13.9504 -4.4982 -3.5755 -8.4914 -40.9716 -49.8182 -190.0269 -43.8594 -44.8667 -45.2596 -13.5561 -17.7664 -87.4105 -107.1056 -37.0245 -30.7133 -112.3218 -32.9619 -27.2971 -35.5836 -17.8586 -5.1391 -43.4094 -7.7843 -16.6785 -58.5103 -159.9936 -49.0782 -37.8426 -32.8002 -74.5249 -133.3423 -11.1638 -5.3575 -12.438 -30.9907 -141.6924 -54.2953 -179.0114 -99.8896 -10.288 -15.1553 -3.7815 -67.6123 -7.696 -88.9304 -47.6448 -94.3718 -70.2733 -71.5057 -21.7854 -12.7657 -7.4383 -23.502 -13.1055 -239.9708 -30.4193 -25.2113 -136.2795 -140.9565 -9.8122 -34.4584 -6.3039 -60.8421 -66.5793 -27.2816 -214.3225 -34.7796 -16.7631 -135.7821 -160.6279 -45.2949 -25.1023 -144.9059 -82.2352 -327.7154 -142.0613 -158.8821 -32.2181 -32.8887 -52.9641 -25.4937 -47.9936 -6.8991 -9.7293 -36.436 -70.3907 -187.7611 -46.9371 -89.8103 -143.4213 -624.3645 -119.2204 -145.4435 -327.7748 -33.3255 -64.0607 -145.4831 -116.5903 -36.2977 -66.3762 -44.8248 -7.5088 -217.9246 -12.9699 -30.504 -2.0369 -6.126 -14.4448 -21.6337 -57.3084 -20.6915 -184.3625 -20.1052 -4.1484 -4.5344 -0.828 -121.4411 -7.9486 -58.5604 -21.4878 -13.5476 -5.646 -15.629 -28.9576 -20.5959 -76.7111 -27.0119 -94.7101 -15.1713 -10.0222 -7.6394 -1.5784 -87.6952 -6.2239 -99.3711 -101.0906 -45.6639 -24.0725 -61.7702 -24.1583 -52.2368 -234.3264 -39.9749 -48.8556 -34.1464 -20.9664 -11.4525 -123.0277 -6.4903 -5.1865 -8.8016 -9.4618 -21.7742 -24.2361 -123.3984 -31.4404 -88.3901 -30.0924 -13.8198 -9.2701 -3.0823 -87.9624 -6.3845 -13.968 -65.0702 -105.523 -13.7403 -13.7625 -50.4223 -2.933 -8.4289 -80.3381 -36.4147 -112.7485 -4.1711 -7.8989 -1.2676 -90.8037 -21.4919 -7.2235 -47.9557 -3.383 -20.433 -64.6138 -45.5781 -56.1309 -6.1345 -18.6307 -2.374 -72.2553 -111.1885 -106.7664 -23.1323 -19.3765 -54.9819 -34.2953 -64.4756 -20.4115 -6.689 -4.378 -59.141 -34.2468 -58.1509 -33.8665 -10.6902 -53.1387 -13.7478 -20.1987 -55.0923 -3.8058 -60.0382 -235.4841 -12.6837 -11.7407 -17.3058 -9.7167 -65.8498 -17.1051 -42.8131 -53.1054 -25.0437 -15.302 -44.0749 -16.9582 -62.9773 -5.204 -5.2963 -86.1704 -3.7209 -6.3445 -1.1264 -122.5771 -23.9041 -355.0145 -31.1013 -32.619 -4.9664 -84.1048 -134.5957 -72.8371 -23.9002 -35.3077 -11.7119 -22.2889 -1.8598 -59.2174 -8.8994 -150.742 -1.8533 -1.9711 -9.9676 -0.5207 -26.9229 -30.429 -5.6289 
Training error : 0.130062

This information tells us very little about how well the model will perform in the real world. We’ll need to examine its performance on the testing dataset to know whether it generalizes well to unseen data.

Step 4: Evaluating model performance

predictions on testing dataset

letter_predictions <- predict(letter_classifier, letters_test)
head(letter_predictions)

[1] U N V X N H
Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

table(letters_test$letter, letter_predictions)

   letter_predictions
      A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U
  A 144   0   0   2   0   0   1   0   0   0   1   0   0   0   1   0   0   0   1   0   1
  B   0 121   0   2   0   0   1   0   1   1   1   0   0   0   0   0   0   7   1   0   0
  C   0   0 120   0   5   0   2   0   0   0   9   0   1   0   2   0   0   0   0   0   3
  D   0   5   0 156   0   0   1   1   0   0   0   0   1   0   1   1   0   0   0   0   1
  E   0   2   4   0 127   0   9   0   0   0   0   2   0   0   0   0   0   1   1   3   0
  F   0   0   0   1   3 138   2   1   1   1   0   0   0   1   0   2   0   0   0   2   0
  G   0   1  10   3   1   2 123   0   0   0   2   1   1   0   1   1   8   3   3   0   0
  H   0   2   2  10   1   2   2 102   0   2   5   1   1   1   2   0   2   8   0   0   2
  I   0   0   2   4   0   6   0   0 141   5   0   0   0   0   0   0   0   0   1   0   0
  J   1   0   0   3   0   0   0   2   8 128   0   0   0   0   1   0   0   0   1   0   0
  K   0   1   1   4   3   0   1   3   0   0 118   0   0   0   0   0   0  13   0   1   0
  L   0   0   3   3   4   0   2   2   0   0   0 133   0   0   0   0   3   0   1   0   0
  M   1   1   0   0   0   0   1   3   0   0   0   0 135   0   0   0   0   0   0   0   0
  N   2   0   0   5   0   0   0   4   0   0   2   0   4 145   1   0   0   1   0   0   0
  O   2   0   2   5   0   0   1  20   0   1   0   0   0   0  99   2   3   1   0   0   1
  P   0   2   0   3   0  16   2   0   1   1   1   0   0   0   3 130   1   1   0   0   0
  Q   5   2   0   1   2   0   8   2   0   3   0   1   0   0   3   0 124   0  14   0   0
  R   0   3   0   4   0   0   2   3   0   0   7   0   0   3   0   0   0 138   0   0   0
  S   1   5   0   0  10   3   4   0   3   2   0   5   0   0   0   0   5   0 101   3   0
  T   1   0   0   0   0   0   3   3   0   0   1   0   0   0   0   0   0   1   3 133   0
  U   1   0   0   0   0   0   0   0   0   0   3   0   3   1   3   0   0   0   0   1 152
  V   0   2   0   0   0   1   0   2   0   0   0   0   0   0   0   0   0   1   0   0   0
  W   1   0   0   0   0   0   0   0   0   0   0   0   8   2   0   0   0   0   0   0   0
  X   0   1   0   3   2   1   1   0   5   1   5   0   0   0   0   0   0   0   2   0   1
  Y   0   0   0   3   0   2   0   1   1   0   0   0   0   0   0   1   2   0   0   2   1
  Z   1   0   0   1   3   0   0   0   1   6   0   1   0   0   0   0   0   0  10   2   0
   letter_predictions
      V   W   X   Y   Z
  A   0   0   0   3   2
  B   0   0   1   0   0
  C   0   0   0   0   0
  D   0   0   0   0   0
  E   0   0   2   0   1
  F   1   0   0   0   0
  G   3   1   0   0   0
  H   4   0   1   1   0
  I   0   0   3   0   3
  J   0   0   0   0   4
  K   0   0   1   0   0
  L   0   0   6   0   0
  M   1   2   0   0   0
  N   2   0   0   0   0
  O   1   0   1   0   0
  P   0   0   0   7   0
  Q   3   0   0   0   0
  R   1   0   0   0   0
  S   0   0   1   0  18
  T   0   0   0   3   3
  U   0   4   0   0   0
  V 126   4   0   0   0
  W   1 127   0   0   0
  X   0   0 137   0   0
  Y   4   0   1 127   0
  Z   0   0   1   0 132

look only at agreement vs. non-agreement

construct a vector of TRUE/FALSE indicating correct/incorrect predictions

agreement <- letter_predictions == letters_test$letter
table(agreement)

agreement
FALSE  TRUE 
  643  3357

prop.table(table(agreement))

agreement
  FALSE    TRUE 
0.16075 0.83925

Step 5: Improving model performance

set.seed(12345)
letter_classifier_rbf <- ksvm(letter ~ ., data = letters_train, kernel = "rbfdot")
letter_predictions_rbf <- predict(letter_classifier_rbf, letters_test)
table(letters_test$letter, letter_predictions_rbf)

   letter_predictions_rbf
      A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U
  A 151   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  B   0 128   0   1   0   0   0   1   0   0   0   0   0   0   0   0   0   3   2   0   0
  C   0   0 133   0   2   0   2   0   0   0   1   0   0   0   2   0   0   1   0   0   1
  D   0   3   0 161   0   0   0   1   0   0   0   0   0   0   0   0   0   1   0   0   1
  E   0   0   3   0 137   0   8   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  F   0   1   0   0   2 148   0   0   0   0   0   0   0   2   0   0   0   0   0   0   0
  G   0   0   1   2   0   0 154   2   0   0   0   1   1   0   0   0   0   2   0   0   0
  H   0   2   0   8   0   0   2 126   0   0   4   0   1   0   0   0   1   5   0   0   1
  I   0   0   2   2   0   3   0   0 151   3   0   0   0   0   0   1   0   0   1   0   0
  J   0   0   0   3   1   0   0   1   3 136   0   0   0   0   1   0   0   0   2   0   0
  K   0   0   0   1   0   0   0   2   0   0 132   0   0   0   0   0   0   9   0   0   0
  L   0   1   1   0   4   0   2   1   0   0   0 142   0   0   0   0   0   1   1   0   0
  M   0   2   0   0   0   0   2   1   0   0   0   0 138   0   0   0   0   0   0   0   0
  N   0   1   0   1   0   0   0   3   0   0   1   0   1 150   5   0   0   3   0   0   0
  O   0   0   0   1   0   0   2   0   0   0   0   0   0   0 129   0   3   2   0   0   0
  P   0   2   0   3   1  11   1   1   0   0   0   0   0   0   2 141   3   1   0   0   0
  Q   3   1   0   1   0   0   0   1   0   0   0   0   0   0   4   0 158   0   0   0   0
  R   0   3   0   3   0   0   0   0   0   0   3   0   0   2   0   0   0 150   0   0   0
  S   0   3   0   0   2   1   0   0   0   0   0   1   0   0   0   0   0   0 152   0   0
  T   1   0   0   2   1   0   2   2   0   0   0   0   0   0   0   0   0   1   0 140   0
  U   0   0   0   0   0   0   0   0   0   0   0   0   1   0   1   0   0   0   0   0 161
  V   0   3   0   0   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
  W   0   1   0   0   0   0   0   0   0   0   0   0   2   1   0   0   0   0   0   0   0
  X   0   1   0   2   0   0   0   0   1   0   2   0   0   0   0   0   0   0   0   0   0
  Y   0   0   0   3   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1
  Z   0   0   0   0   2   0   0   0   0   3   0   0   0   0   0   0   0   0   2   0   0
   letter_predictions_rbf
      V   W   X   Y   Z
  A   0   0   0   4   0
  B   0   0   1   0   0
  C   0   0   0   0   0
  D   0   0   0   0   0
  E   0   0   1   0   3
  F   0   0   0   0   0
  G   0   1   0   0   0
  H   0   0   0   1   0
  I   0   0   0   0   2
  J   0   0   0   0   1
  K   0   0   2   0   0
  L   0   0   4   0   0
  M   0   1   0   0   0
  N   1   0   0   0   0
  O   0   2   0   0   0
  P   0   0   0   2   0
  Q   0   0   0   0   0
  R   0   0   0   0   0
  S   0   0   1   0   1
  T   0   0   1   1   0
  U   2   3   0   0   0
  V 131   0   0   0   0
  W   0 135   0   0   0
  X   0   0 153   0   0
  Y   1   0   1 138   0
  Z   0   0   1   0 150

agreement_rbf <- letter_predictions_rbf == letters_test$letter
table(agreement_rbf)

agreement_rbf
FALSE  TRUE 
  275  3725

prop.table(table(agreement_rbf))

agreement_rbf
  FALSE    TRUE 
0.06875 0.93125

using h2o deeplearning

library(h2o)

package ‘h2o’ was built under R version 3.3.2
----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames, colnames<-,
    ifelse, is.character, is.factor, is.numeric, log, log10, log1p, log2, round,
    signif, trunc

h2o.init()


H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /var/folders/2z/36b018md22j8c18318cmt1f00000gn/T//RtmpKujUSj/h2o_meierhabarexiti_started_from_r.out
    /var/folders/2z/36b018md22j8c18318cmt1f00000gn/T//RtmpKujUSj/h2o_meierhabarexiti_started_from_r.err

java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-468-11M4833)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-468, mixed mode)


Starting H2O JVM and connecting: ... Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         5 seconds 140 milliseconds 
    H2O cluster version:        3.10.4.6 
    H2O cluster version age:    1 month and 4 days  
    H2O cluster name:           H2O_started_from_R_meierhabarexiti_wti833 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   0.12 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 3.3.1 (2016-06-21) 

Note:  As started, H2O is limited to the CRAN default of 2 CPUs.
       Shut down and restart H2O as shown below to use all your CPUs.
           > h2o.shutdown()
           > h2o.init(nthreads = -1)

letterdata.hex <- h2o.importFile("http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml11/letterdata.csv")


  |                                                                                       
  |                                                                                 |   0%
  |                                                                                       
  |=================================================================================| 100%

summary(letterdata.hex)

Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.

 letter xbox             ybox             width            height          
 U:813  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 D:805  1st Qu.: 3.000   1st Qu.: 5.000   1st Qu.: 4.000   1st Qu.: 4.000  
 P:803  Median : 4.000   Median : 7.000   Median : 5.000   Median : 6.000  
 T:796  Mean   : 4.024   Mean   : 7.035   Mean   : 5.122   Mean   : 5.372  
 M:792  3rd Qu.: 5.000   3rd Qu.: 9.000   3rd Qu.: 6.000   3rd Qu.: 7.000  
 A:789  Max.   :15.000   Max.   :15.000   Max.   :15.000   Max.   :15.000  
 onpix            xbar             ybar           x2bar            y2bar           
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.0   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 2.000   1st Qu.: 6.000   1st Qu.: 6.0   1st Qu.: 3.000   1st Qu.: 4.000  
 Median : 3.000   Median : 7.000   Median : 7.0   Median : 4.000   Median : 5.000  
 Mean   : 3.506   Mean   : 6.898   Mean   : 7.5   Mean   : 4.629   Mean   : 5.179  
 3rd Qu.: 5.000   3rd Qu.: 8.000   3rd Qu.: 9.0   3rd Qu.: 6.000   3rd Qu.: 7.000  
 Max.   :15.000   Max.   :15.000   Max.   :15.0   Max.   :15.000   Max.   :15.000  
 xybar            x2ybar           xy2bar           xedge            xedgey          
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 7.000   1st Qu.: 5.000   1st Qu.: 7.000   1st Qu.: 1.000   1st Qu.: 8.000  
 Median : 8.000   Median : 6.000   Median : 8.000   Median : 3.000   Median : 8.000  
 Mean   : 8.282   Mean   : 6.454   Mean   : 7.929   Mean   : 3.046   Mean   : 8.339  
 3rd Qu.:10.000   3rd Qu.: 8.000   3rd Qu.: 9.000   3rd Qu.: 4.000   3rd Qu.: 9.000  
 Max.   :15.000   Max.   :15.000   Max.   :15.000   Max.   :15.000   Max.   :15.000  
 yedge            yedgex          
 Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 2.000   1st Qu.: 7.000  
 Median : 3.000   Median : 8.000  
 Mean   : 3.692   Mean   : 7.801  
 3rd Qu.: 5.000   3rd Qu.: 9.000  
 Max.   :15.000   Max.   :15.000

splits <- h2o.splitFrame(letterdata.hex, 0.80, seed=1234)
dl <- h2o.deeplearning(x=2:17,y="letter",training_frame=splits[[1]],activation = "RectifierWithDropout", 
                       hidden = c(16,16,16), distribution = "multinomial",input_dropout_ratio=0.2,
                       epochs = 10,nfold=5,variable_importances = TRUE)


  |                                                                                       
  |                                                                                 |   0%
  |                                                                                       
  |=============                                                                    |  16%
  |                                                                                       
  |===========================                                                      |  33%
  |                                                                                       
  |=========================================                                        |  51%
  |                                                                                       
  |=======================================================                          |  68%
  |                                                                                       
  |===================================================================              |  83%
  |                                                                                       
  |=========================================================================        |  90%
  |                                                                                       
  |=================================================================================| 100%

dl.predict <- h2o.predict (dl, splits[[2]])


  |                                                                                       
  |                                                                                 |   0%
  |                                                                                       
  |=================================================================================| 100%

dl@parameters

$model_id
[1] "DeepLearning_model_R_1496288651976_1"

$training_frame
[1] "RTMP_sid_a9a3_2"

$nfolds
[1] 5

$overwrite_with_best_model
[1] FALSE

$activation
[1] "RectifierWithDropout"

$hidden
[1] 16 16 16

$epochs
[1] 10.40785

$seed
[1] -5.085292e+18

$input_dropout_ratio
[1] 0.2

$distribution
[1] "multinomial"

$stopping_rounds
[1] 0

$variable_importances
[1] TRUE

$x
 [1] "xbox"   "ybox"   "width"  "height" "onpix"  "xbar"   "ybar"   "x2bar"  "y2bar" 
[10] "xybar"  "x2ybar" "xy2bar" "xedge"  "xedgey" "yedge"  "yedgex"

$y
[1] "letter"

h2o.performance(dl)

H2OMultinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on temporary training frame with 10076 samples **

Training Set Metrics: 
=====================

MSE: (Extract with `h2o.mse`) 0.8354604
RMSE: (Extract with `h2o.rmse`) 0.9140352
Logloss: (Extract with `h2o.logloss`) 2.617066
Mean Per-Class Error: 0.831097
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: vertical: actual; across: predicted
    A   B C  D E F G H  I J K L M N O   P Q   R   S  T U V W  X Y Z  Error
A 117   6 0 12 0 0 3 0  0 0 0 0 2 0 0   2 0 224   8 24 0 0 0  0 0 0 0.7060
B   0 325 0  9 0 0 0 0  6 0 0 0 0 0 0  13 0   2  38 27 0 0 0  2 0 0 0.2299
C   0  25 0  9 0 1 4 0  7 0 0 0 1 0 0   1 0   0 234 38 1 0 0 61 0 0 1.0000
D   0  94 0  9 0 0 3 0 30 0 0 0 0 0 0 225 0   2  17 25 0 0 0  1 0 0 0.9778
E   0  88 0  0 0 0 3 0  1 0 0 0 0 0 0   1 0   0 272 13 0 0 0  6 0 2 1.0000
              Rate
A =      281 / 398
B =       97 / 422
C =      382 / 382
D =      397 / 406
E =      386 / 386

---
         A    B C   D E F   G H   I J K L   M  N O   P Q   R    S    T U  V  W   X Y   Z
V        0    1 0   9 0 0   0 0   0 0 0 0   1  0 0  12 0   0    1  360 0  0  0   0 0   0
W        0    0 0   2 0 0   0 0   0 0 0 0   5  6 0  10 0   0    1  311 0 56  3   0 0   0
X        0   74 0   3 0 0   0 0  53 0 0 0   0  0 0   0 0   0  176   71 0  0  0   4 0   0
Y        0    2 0   5 0 0   0 0   1 0 0 0   0  0 0   0 0   0   14  344 0  0  0   4 0   0
Z        0   35 0   0 0 0   0 0   0 0 0 0   0  0 0   0 0   2  235   20 0  0  0   0 0  85
Totals 134 2368 0 271 0 2 109 0 431 0 0 0 372 69 0 778 0 316 1782 3060 8 91 10 175 0 100
        Error             Rate
V      1.0000 =      384 / 384
W      0.9924 =      391 / 394
X      0.9895 =      377 / 381
Y      1.0000 =      370 / 370
Z      0.7745 =      292 / 377
Totals 0.8264 = 8,327 / 10,076

Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-10 Hit Ratios: 
    k hit_ratio
1   1  0.173581
2   2  0.311731
3   3  0.411473
4   4  0.512703
5   5  0.575427
6   6  0.643112
7   7  0.714669
8   8  0.758833
9   9  0.789004
10 10  0.828702

h2o.varimp(dl)

Variable Importances: 
   variable relative_importance scaled_importance percentage
1     yedge            1.000000          1.000000   0.107052
2     xedge            0.902801          0.902801   0.096646
3    xy2bar            0.851448          0.851448   0.091149
4    x2ybar            0.813513          0.813513   0.087088
5    xedgey            0.769036          0.769036   0.082326
6      ybar            0.743807          0.743807   0.079626
7     y2bar            0.687911          0.687911   0.073642
8     x2bar            0.605888          0.605888   0.064861
9    yedgex            0.521135          0.521135   0.055788
10    onpix            0.499478          0.499478   0.053470
11     xbar            0.420246          0.420246   0.044988
12    width            0.376321          0.376321   0.040286
13    xybar            0.324365          0.324365   0.034724
14     ybox            0.303355          0.303355   0.032475
15     xbox            0.295416          0.295416   0.031625
16   height            0.226573          0.226573   0.024255

h2o.shutdown()

LS0tCnRpdGxlOiAiU1ZNIGFuYWx5c2lzIG9uIHRoZSBPQ1IgYW5hbHlzaXMgbGV0dGVyIGRhdGEiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0gCgojIyBTdGVwIDE6IERhdGEgY29sbGVjdGluZyAKVG8gdGhpcyBlbmQsIHdlJ2xsIHVzZSBhIGRhdGFzZXQgZG9uYXRlZCB0byB0aGUgVUNJIE1hY2hpbmUgTGVhcm5pbmcgRGF0YSBSZXBvc2l0b3J5IChodHRwOi8vYXJjaGl2ZS5pY3MudWNpLmVkdS9tbCkgYnkgVy4gRnJleSBhbmQgRC4gSi4gU2xhdGUuIFRoZSBkYXRhc2V0IGNvbnRhaW5zIDIwLDAwMCBleGFtcGxlcyBvZiAyNiBFbmdsaXNoIGFscGhhYmV0IGNhcGl0YWwgbGV0dGVycyBhcyBwcmludGVkIHVzaW5nIDIwIGRpZmZlcmVudCByYW5kb21seSByZXNoYXBlZCBhbmQgZGlzdG9ydGVkIGJsYWNrIGFuZCB3aGl0ZSBmb250cy4KCiMjIFN0ZXAgMjogRXhwbG9yaW5nIGFuZCBwcmVwYXJpbmcgdGhlIGRhdGEKIyByZWFkIGluIGRhdGEgYW5kIGV4YW1pbmUgc3RydWN0dXJlCmBgYHtyfQpsZXR0ZXJzIDwtIHJlYWQuY3N2KCJodHRwOi8vd3d3LnNjaS5jc3VlYXN0YmF5LmVkdS9+ZXN1ZXNzL2NsYXNzZXMvU3RhdGlzdGljc182NjIwL1ByZXNlbnRhdGlvbnMvbWwxMS9sZXR0ZXJkYXRhLmNzdiIpCnN0cihsZXR0ZXJzKQpgYGAKCiMgZGl2aWRlIGludG8gdHJhaW5pbmcgYW5kIHRlc3QgZGF0YQpgYGB7cn0KbGV0dGVyc190cmFpbiA8LSBsZXR0ZXJzWzE6MTYwMDAsIF0KbGV0dGVyc190ZXN0ICA8LSBsZXR0ZXJzWzE2MDAxOjIwMDAwLCBdCmBgYAoKIyMgU3RlcCAzOiBUcmFpbmluZyBhIG1vZGVsIG9uIHRoZSBkYXRhIAojIGJlZ2luIGJ5IHRyYWluaW5nIGEgc2ltcGxlIGxpbmVhciBTVk0KYGBge3J9CmxpYnJhcnkoa2VybmxhYikKbGV0dGVyX2NsYXNzaWZpZXIgPC0ga3N2bShsZXR0ZXIgfiAuLCBkYXRhID0gbGV0dGVyc190cmFpbiwKICAgICAgICAgICAgICAgICAgICAgICAgICBrZXJuZWwgPSAidmFuaWxsYWRvdCIpCmBgYAoKIyBsb29rIGF0IGJhc2ljIGluZm9ybWF0aW9uIGFib3V0IHRoZSBtb2RlbApgYGB7cn0KbGV0dGVyX2NsYXNzaWZpZXIKYGBgClRoaXMgaW5mb3JtYXRpb24gdGVsbHMgdXMgdmVyeSBsaXR0bGUgYWJvdXQgaG93IHdlbGwgdGhlIG1vZGVsIHdpbGwgcGVyZm9ybSBpbiB0aGUgcmVhbAp3b3JsZC4gV2UnbGwgbmVlZCB0byBleGFtaW5lIGl0cyBwZXJmb3JtYW5jZSBvbiB0aGUgdGVzdGluZyBkYXRhc2V0IHRvIGtub3cgd2hldGhlcgppdCBnZW5lcmFsaXplcyB3ZWxsIHRvIHVuc2VlbiBkYXRhLgoKIyMgU3RlcCA0OiBFdmFsdWF0aW5nIG1vZGVsIHBlcmZvcm1hbmNlCiMgcHJlZGljdGlvbnMgb24gdGVzdGluZyBkYXRhc2V0CmBgYHtyfQpsZXR0ZXJfcHJlZGljdGlvbnMgPC0gcHJlZGljdChsZXR0ZXJfY2xhc3NpZmllciwgbGV0dGVyc190ZXN0KQoKaGVhZChsZXR0ZXJfcHJlZGljdGlvbnMpCgp0YWJsZShsZXR0ZXJzX3Rlc3QkbGV0dGVyLCBsZXR0ZXJfcHJlZGljdGlvbnMpCmBgYAoKIyBsb29rIG9ubHkgYXQgYWdyZWVtZW50IHZzLiBub24tYWdyZWVtZW50CiMgY29uc3RydWN0IGEgdmVjdG9yIG9mIFRSVUUvRkFMU0UgaW5kaWNhdGluZyBjb3JyZWN0L2luY29ycmVjdCBwcmVkaWN0aW9ucwpgYGB7cn0KYWdyZWVtZW50IDwtIGxldHRlcl9wcmVkaWN0aW9ucyA9PSBsZXR0ZXJzX3Rlc3QkbGV0dGVyCnRhYmxlKGFncmVlbWVudCkKcHJvcC50YWJsZSh0YWJsZShhZ3JlZW1lbnQpKQpgYGAKCiMjIFN0ZXAgNTogSW1wcm92aW5nIG1vZGVsIHBlcmZvcm1hbmNlCmBgYHtyfQpzZXQuc2VlZCgxMjM0NSkKbGV0dGVyX2NsYXNzaWZpZXJfcmJmIDwtIGtzdm0obGV0dGVyIH4gLiwgZGF0YSA9IGxldHRlcnNfdHJhaW4sIGtlcm5lbCA9ICJyYmZkb3QiKQpsZXR0ZXJfcHJlZGljdGlvbnNfcmJmIDwtIHByZWRpY3QobGV0dGVyX2NsYXNzaWZpZXJfcmJmLCBsZXR0ZXJzX3Rlc3QpCgp0YWJsZShsZXR0ZXJzX3Rlc3QkbGV0dGVyLCBsZXR0ZXJfcHJlZGljdGlvbnNfcmJmKQoKYWdyZWVtZW50X3JiZiA8LSBsZXR0ZXJfcHJlZGljdGlvbnNfcmJmID09IGxldHRlcnNfdGVzdCRsZXR0ZXIKdGFibGUoYWdyZWVtZW50X3JiZikKcHJvcC50YWJsZSh0YWJsZShhZ3JlZW1lbnRfcmJmKSkKYGBgCgojIHVzaW5nIGgybyBkZWVwbGVhcm5pbmcKYGBge3J9CgpsaWJyYXJ5KGgybykKCmgyby5pbml0KCkKCmxldHRlcmRhdGEuaGV4IDwtIGgyby5pbXBvcnRGaWxlKCJodHRwOi8vd3d3LnNjaS5jc3VlYXN0YmF5LmVkdS9+ZXN1ZXNzL2NsYXNzZXMvU3RhdGlzdGljc182NjIwL1ByZXNlbnRhdGlvbnMvbWwxMS9sZXR0ZXJkYXRhLmNzdiIpCgpzdW1tYXJ5KGxldHRlcmRhdGEuaGV4KQoKc3BsaXRzIDwtIGgyby5zcGxpdEZyYW1lKGxldHRlcmRhdGEuaGV4LCAwLjgwLCBzZWVkPTEyMzQpCgpkbCA8LSBoMm8uZGVlcGxlYXJuaW5nKHg9MjoxNyx5PSJsZXR0ZXIiLHRyYWluaW5nX2ZyYW1lPXNwbGl0c1tbMV1dLGFjdGl2YXRpb24gPSAiUmVjdGlmaWVyV2l0aERyb3BvdXQiLCAKICAgICAgICAgICAgICAgICAgICAgICBoaWRkZW4gPSBjKDE2LDE2LDE2KSwgZGlzdHJpYnV0aW9uID0gIm11bHRpbm9taWFsIixpbnB1dF9kcm9wb3V0X3JhdGlvPTAuMiwKICAgICAgICAgICAgICAgICAgICAgICBlcG9jaHMgPSAxMCxuZm9sZD01LHZhcmlhYmxlX2ltcG9ydGFuY2VzID0gVFJVRSkKCmRsLnByZWRpY3QgPC0gaDJvLnByZWRpY3QgKGRsLCBzcGxpdHNbWzJdXSkKCmRsQHBhcmFtZXRlcnMKCmgyby5wZXJmb3JtYW5jZShkbCkKCmgyby52YXJpbXAoZGwpCgpoMm8uc2h1dGRvd24oKQpgYGAKCgo=

SVM analysis on the OCR analysis letter data