Page Title

Character Recognition using Support Vector Machines (SVM)

Character recognition is one of the more challenging tasks in Machine Learning. In this example, we build a model using SVM and use it for charcter recognition.

## 'data.frame':    20000 obs. of  17 variables:
##  $ charac : Factor w/ 26 levels "A","B","C","D",..: 12 15 18 3 20 16 10 6 23 10 ...
##  $ xhash  : int  3 2 3 5 3 2 3 2 5 6 ...
##  $ yhash  : int  7 4 7 6 4 2 9 1 7 9 ...
##  $ breadth: int  3 3 3 6 3 3 5 3 5 9 ...
##  $ depth  : int  5 3 5 5 3 3 7 2 5 7 ...
##  $ onpix  : int  1 2 3 5 1 2 6 1 4 5 ...
##  $ xb     : int  0 7 6 5 5 6 9 5 3 8 ...
##  $ yb     : int  1 7 10 6 11 11 8 11 11 5 ...
##  $ x2b    : int  6 7 7 3 3 5 3 3 2 7 ...
##  $ y2b    : int  6 4 3 6 7 4 3 5 2 7 ...
##  $ xyb    : int  0 7 7 7 11 10 8 11 10 8 ...
##  $ x2yb   : int  0 5 4 7 9 7 4 9 9 6 ...
##  $ xy2b   : int  6 8 8 11 5 2 6 5 7 7 ...
##  $ xedge  : int  0 2 2 5 1 1 4 1 5 2 ...
##  $ xedge_y: int  8 8 7 11 11 10 8 10 11 6 ...
##  $ yedge  : int  0 3 5 7 2 4 6 3 1 4 ...
##  $ yedge_x: int  8 8 11 8 5 6 4 6 7 6 ...
##      charac          xhash            yhash           breadth      
##  U      :  813   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  D      :  805   1st Qu.: 3.000   1st Qu.: 5.000   1st Qu.: 4.000  
##  P      :  803   Median : 4.000   Median : 7.000   Median : 5.000  
##  T      :  796   Mean   : 4.024   Mean   : 7.035   Mean   : 5.122  
##  M      :  792   3rd Qu.: 5.000   3rd Qu.: 9.000   3rd Qu.: 6.000  
##  A      :  789   Max.   :15.000   Max.   :15.000   Max.   :15.000  
##  (Other):15202                                                     
##      depth            onpix              xb               yb      
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.0  
##  1st Qu.: 4.000   1st Qu.: 2.000   1st Qu.: 6.000   1st Qu.: 6.0  
##  Median : 6.000   Median : 3.000   Median : 7.000   Median : 7.0  
##  Mean   : 5.372   Mean   : 3.506   Mean   : 6.898   Mean   : 7.5  
##  3rd Qu.: 7.000   3rd Qu.: 5.000   3rd Qu.: 8.000   3rd Qu.: 9.0  
##  Max.   :15.000   Max.   :15.000   Max.   :15.000   Max.   :15.0  
##                                                                   
##       x2b              y2b              xyb              x2yb       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 3.000   1st Qu.: 4.000   1st Qu.: 7.000   1st Qu.: 5.000  
##  Median : 4.000   Median : 5.000   Median : 8.000   Median : 6.000  
##  Mean   : 4.629   Mean   : 5.179   Mean   : 8.282   Mean   : 6.454  
##  3rd Qu.: 6.000   3rd Qu.: 7.000   3rd Qu.:10.000   3rd Qu.: 8.000  
##  Max.   :15.000   Max.   :15.000   Max.   :15.000   Max.   :15.000  
##                                                                     
##       xy2b            xedge           xedge_y           yedge       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 7.000   1st Qu.: 1.000   1st Qu.: 8.000   1st Qu.: 2.000  
##  Median : 8.000   Median : 3.000   Median : 8.000   Median : 3.000  
##  Mean   : 7.929   Mean   : 3.046   Mean   : 8.339   Mean   : 3.692  
##  3rd Qu.: 9.000   3rd Qu.: 4.000   3rd Qu.: 9.000   3rd Qu.: 5.000  
##  Max.   :15.000   Max.   :15.000   Max.   :15.000   Max.   :15.000  
##                                                                     
##     yedge_x      
##  Min.   : 0.000  
##  1st Qu.: 7.000  
##  Median : 8.000  
##  Mean   : 7.801  
##  3rd Qu.: 9.000  
##  Max.   :15.000  
## 

Building a training and a test dataset

Building a SVM model classifier

##  Setting default kernel parameters

Predicting the characters of the data in the test dataset

##                   
## charac_predictions   A   B   C   D   E   F   G   H   I   J   K   L   M   N
##                  A 191   0   0   2   0   0   0   0   0   1   0   0   0   0
##                  B   0 171   0   7   1   1   0   5   1   0   2   0   0   0
##                  C   0   0 150   0   1   0   5   3   0   0   5   2   0   0
##                  D   0   6   0 184   0   0   4   9   5   4   1   0   0   5
##                  E   0   2   9   0 161   3   1   0   0   0   1   9   0   0
##                  F   0   0   1   0   1 162   1   6   4   0   0   0   0   0
##                  G   0   3   7   1   9   4 149   2   1   0   4   5   0   0
##                  H   0   0   2   1   0   0   1 133   0   1   3   2   2   8
##                  I   0   3   0   0   0   1   0   0 163  15   0   0   0   0
##                  J   1   0   0   0   0   0   0   2   5 159   0   0   0   0
##                  K   1   2   5   1   1   0   5   4   0   0 137   1   0   0
##                  L   0   0   1   0   1   0   3   0   0   0   2 163   0   0
##                  M   1   0   2   1   0   0   1   0   0   0   0   0 184   5
##                  N   0   0   0   3   0   0   0   2   0   2   0   0   1 177
##                  O   0   0   4   0   0   0   1  11   0   1   0   0   0   1
##                  P   0   0   0   0   0   2   1   0   0   0   0   0   0   0
##                  Q   0   0   0   0   2   0  10   3   0   0   0   4   0   0
##                  R   1  15   0   1   1   0   3   9   0   0  16   0   5   2
##                  S   1   4   0   0   3   3   2   0   2   1   0   2   0   0
##                  T   0   1   1   0   4   8   0   0   0   0   0   0   0   0
##                  U   1   0   3   0   0   0   0   2   0   0   0   0   2   0
##                  V   0   1   0   0   0   0   4   0   0   0   0   0   0   2
##                  W   0   0   0   0   0   0   1   0   0   0   0   0   8   0
##                  X   0   1   0   0   1   0   0   3   0   0   8   4   0   0
##                  Y   4   0   0   0   0   1   0   0   1   0   0   0   0   1
##                  Z   1   0   0   0   3   1   0   0   4   2   0   0   0   0
##                   
## charac_predictions   O   P   Q   R   S   T   U   V   W   X   Y   Z
##                  A   4   0   5   2   0   0   2   2   1   0   1   1
##                  B   0   0   1   4  10   0   0   4   0   0   0   0
##                  C   2   0   1   0   1   0   0   0   0   0   0   0
##                  D   6   1   0   6   1   0   0   0   0   2   0   0
##                  E   0   0   5   0  10   0   0   0   0   2   0   7
##                  F   0  15   0   0   7   2   0   0   0   1   4   0
##                  G   0   3  10   3   5   1   0   1   2   2   0   0
##                  H  16   0   0   3   0   0   1   4   2   0   2   0
##                  I   0   0   1   0   1   0   0   0   0   2   1   1
##                  J   0   0   2   0   1   0   0   0   0   1   0   4
##                  K   0   1   0   8   0   2   2   0   0   5   0   0
##                  L   0   0   0   1   4   0   0   0   0   1   0   1
##                  M   2   0   0   0   0   0   5   0   8   0   1   0
##                  N   0   0   0   1   0   0   0   1   0   0   0   0
##                  O 144   0   4   1   0   0   2   0   2   0   0   0
##                  P   0 145   0   0   0   0   0   1   0   0   1   0
##                  Q   0   1 179   2   6   0   0   0   0   0   3   1
##                  R   1   0   0 151   2   0   0   0   0   2   0   0
##                  S   0   1   7   0 119   4   0   0   0   1   1  15
##                  T   0   0   0   1   4 181   0   0   0   3   3   3
##                  U   2   0   0   0   0   1 173   0   0   0   0   0
##                  V   0   1   0   0   0   0   0 170   1   0   7   0
##                  W   5   0   0   0   0   0   1   2 168   0   0   0
##                  X   0   0   0   0   0   2   0   0   0 186   1   0
##                  Y   0   2   0   0   0   4   0   2   0   0 194   0
##                  Z   0   0   1   0  24   1   0   0   0   0   0 135
## correct_predictn
## FALSE  TRUE 
##   771  4229
## correct_predictn
##  FALSE   TRUE 
## 0.1542 0.8458

Evaluating model performance using rbfdot kernel

Predicting the characters of the data in the test dataset using the improved model with rbfdot kernel

##                       
## charac_predictions_rbf   A   B   C   D   E   F   G   H   I   J   K   L   M
##                      A 197   0   0   0   0   0   0   0   0   0   0   0   0
##                      B   0 196   0   3   2   3   1   5   0   0   0   0   5
##                      C   1   0 162   0   2   0   1   0   0   0   0   0   0
##                      D   0   2   0 193   0   1   3   4   3   1   0   0   0
##                      E   0   1   7   0 171   6   0   0   0   1   0   9   0
##                      F   0   0   0   0   0 169   0   1   3   0   0   0   0
##                      G   0   0   5   0  10   0 184   4   0   0   0   5   1
##                      H   0   0   1   2   0   1   0 162   0   1   3   1   0
##                      I   0   1   0   0   0   0   0   0 171   8   0   0   0
##                      J   0   0   0   0   0   0   0   0   7 170   0   0   0
##                      K   0   0   0   0   1   0   1   5   0   0 161   0   0
##                      L   0   0   0   0   0   0   0   0   0   0   0 170   0
##                      M   1   0   1   0   0   0   0   0   0   0   0   0 192
##                      N   0   0   0   1   0   0   0   1   0   1   0   0   1
##                      O   0   0   6   0   0   0   1   5   0   2   0   0   0
##                      P   0   0   0   0   0   2   0   0   0   0   0   0   0
##                      Q   0   0   0   0   0   0   0   1   0   1   0   0   0
##                      R   0   7   1   1   1   0   0   4   0   0  11   2   0
##                      S   0   2   0   0   0   0   0   0   1   1   0   2   0
##                      T   0   0   0   0   0   3   0   0   0   0   0   0   0
##                      U   0   0   2   1   0   0   0   0   0   0   0   0   0
##                      V   0   0   0   0   0   0   0   0   0   0   0   0   0
##                      W   0   0   0   0   0   0   1   0   0   0   0   0   3
##                      X   0   0   0   0   0   0   0   0   0   0   4   3   0
##                      Y   3   0   0   0   0   1   0   2   0   0   0   0   0
##                      Z   0   0   0   0   2   0   0   0   1   0   0   0   0
##                       
## charac_predictions_rbf   N   O   P   Q   R   S   T   U   V   W   X   Y   Z
##                      A   0   0   0   0   0   0   0   0   0   0   0   0   0
##                      B   0   0   0   1   2   1   0   0   5   1   0   0   0
##                      C   0   0   0   0   0   1   0   0   0   0   0   0   0
##                      D   4   6   1   0   3   0   0   0   0   0   1   0   0
##                      E   0   0   0   0   0   1   1   0   0   0   1   0   3
##                      F   0   0  13   0   0   6   2   0   1   0   0   0   0
##                      G   0   1   1   2   1   0   0   0   0   2   0   0   0
##                      H   3   0   0   1   0   0   4   0   1   1   0   2   0
##                      I   0   0   0   0   0   0   0   0   0   0   0   0   0
##                      J   0   0   0   0   0   0   0   0   0   0   0   0   0
##                      K   0   0   1   0   2   0   0   2   0   0   3   0   0
##                      L   0   0   0   0   0   0   0   0   0   0   0   0   0
##                      M   1   0   0   0   0   0   0   2   0   3   0   1   0
##                      N 180   0   0   0   1   0   0   0   0   0   0   0   0
##                      O   4 166   0   3   0   0   0   1   0   2   1   0   0
##                      P   0   0 152   0   0   0   1   0   0   0   0   1   0
##                      Q   0   3   1 208   0   0   0   0   0   0   0   0   3
##                      R   7   0   0   0 174   1   0   0   4   0   1   0   0
##                      S   0   0   0   1   0 184   2   0   0   0   0   0   1
##                      T   0   0   0   0   0   0 184   0   0   0   0   2   0
##                      U   0   2   0   0   0   0   1 178   0   1   0   0   0
##                      V   0   0   0   0   0   0   0   0 174   1   0   3   0
##                      W   1   4   0   0   0   0   0   3   1 173   0   0   0
##                      X   0   0   0   0   0   0   2   0   0   0 201   1   1
##                      Y   1   0   1   0   0   0   1   0   1   0   0 209   0
##                      Z   0   0   0   0   0   1   0   0   0   0   0   0 160
## correct_predictn_rbf
## FALSE  TRUE 
##   359  4641
## correct_predictn_rbf
##  FALSE   TRUE 
## 0.0718 0.9282

Evaluating model performance using polydot kernel

##  Setting default kernel parameters

Predicting the characters of the data in the test dataset using the improved model with polydot kernel

##                        
## charac_predictions_poly   A   B   C   D   E   F   G   H   I   J   K   L
##                       A 191   0   0   2   0   0   0   0   0   1   0   0
##                       B   0 171   0   7   1   1   0   5   1   0   2   0
##                       C   0   0 150   0   1   0   5   3   0   0   5   2
##                       D   0   6   0 184   0   0   4   9   5   4   1   0
##                       E   0   2   9   0 161   3   1   0   0   0   1   9
##                       F   0   0   1   0   1 162   1   6   4   0   0   0
##                       G   0   3   7   1   9   4 149   2   1   0   4   5
##                       H   0   0   2   1   0   0   1 133   0   1   3   2
##                       I   0   3   0   0   0   1   0   0 163  15   0   0
##                       J   1   0   0   0   0   0   0   2   5 159   0   0
##                       K   1   2   5   1   1   0   5   4   0   0 137   1
##                       L   0   0   1   0   1   0   3   0   0   0   2 163
##                       M   1   0   2   1   0   0   1   0   0   0   0   0
##                       N   0   0   0   3   0   0   0   2   0   2   0   0
##                       O   0   0   4   0   0   0   1  11   0   1   0   0
##                       P   0   0   0   0   0   2   1   0   0   0   0   0
##                       Q   0   0   0   0   2   0  10   3   0   0   0   4
##                       R   1  15   0   1   1   0   3   9   0   0  16   0
##                       S   1   4   0   0   3   3   2   0   2   1   0   2
##                       T   0   1   1   0   4   8   0   0   0   0   0   0
##                       U   1   0   3   0   0   0   0   2   0   0   0   0
##                       V   0   1   0   0   0   0   4   0   0   0   0   0
##                       W   0   0   0   0   0   0   1   0   0   0   0   0
##                       X   0   1   0   0   1   0   0   3   0   0   8   4
##                       Y   4   0   0   0   0   1   0   0   1   0   0   0
##                       Z   1   0   0   0   3   1   0   0   4   2   0   0
##                        
## charac_predictions_poly   M   N   O   P   Q   R   S   T   U   V   W   X
##                       A   0   0   4   0   5   2   0   0   2   2   1   0
##                       B   0   0   0   0   1   4  10   0   0   4   0   0
##                       C   0   0   2   0   1   0   1   0   0   0   0   0
##                       D   0   5   6   1   0   6   1   0   0   0   0   2
##                       E   0   0   0   0   5   0  10   0   0   0   0   2
##                       F   0   0   0  15   0   0   7   2   0   0   0   1
##                       G   0   0   0   3  10   3   5   1   0   1   2   2
##                       H   2   8  16   0   0   3   0   0   1   4   2   0
##                       I   0   0   0   0   1   0   1   0   0   0   0   2
##                       J   0   0   0   0   2   0   1   0   0   0   0   1
##                       K   0   0   0   1   0   8   0   2   2   0   0   5
##                       L   0   0   0   0   0   1   4   0   0   0   0   1
##                       M 184   5   2   0   0   0   0   0   5   0   8   0
##                       N   1 177   0   0   0   1   0   0   0   1   0   0
##                       O   0   1 144   0   4   1   0   0   2   0   2   0
##                       P   0   0   0 145   0   0   0   0   0   1   0   0
##                       Q   0   0   0   1 179   2   6   0   0   0   0   0
##                       R   5   2   1   0   0 151   2   0   0   0   0   2
##                       S   0   0   0   1   7   0 119   4   0   0   0   1
##                       T   0   0   0   0   0   1   4 181   0   0   0   3
##                       U   2   0   2   0   0   0   0   1 173   0   0   0
##                       V   0   2   0   1   0   0   0   0   0 170   1   0
##                       W   8   0   5   0   0   0   0   0   1   2 168   0
##                       X   0   0   0   0   0   0   0   2   0   0   0 186
##                       Y   0   1   0   2   0   0   0   4   0   2   0   0
##                       Z   0   0   0   0   1   0  24   1   0   0   0   0
##                        
## charac_predictions_poly   Y   Z
##                       A   1   1
##                       B   0   0
##                       C   0   0
##                       D   0   0
##                       E   0   7
##                       F   4   0
##                       G   0   0
##                       H   2   0
##                       I   1   1
##                       J   0   4
##                       K   0   0
##                       L   0   1
##                       M   1   0
##                       N   0   0
##                       O   0   0
##                       P   1   0
##                       Q   3   1
##                       R   0   0
##                       S   1  15
##                       T   3   3
##                       U   0   0
##                       V   7   0
##                       W   0   0
##                       X   1   0
##                       Y 194   0
##                       Z   0 135
## correct_predictn_poly
## FALSE  TRUE 
##   771  4229
## correct_predictn_poly
##  FALSE   TRUE 
## 0.1542 0.8458

Evaluating model performance using tanhdot kernel

##  Setting default kernel parameters

Predicting the characters of the data in the test dataset using the improved model with tanhdot kernel

##                       
## charac_predictions_tan  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q
##                      A 29 34  1 27  6  3 12 39 18 32 30 33 63 22 42  2 67
##                      B  2 55 10 19 13  6 19  8  4  3  9  3  6  0  8  4 27
##                      C  0  0 13  0  0  0  6  0  1  4  3  1  0  0  2  0  0
##                      D  2  3  1 23  0 15  1  0  1  0  0  4  0  4  6  9  2
##                      E  0  0 23  0  7  0 10  0  2  0  2  2  0  0  0  0  1
##                      F  0  4  9  2  9 49  1  6 51 21  1  0  0  7  1 88  0
##                      G  1  1  9  1  3  1  3 16  0  0 25 19  0 10  8  0 27
##                      H  5  6  6 27  1  7  9  9  0  2  1  6 23 15 21 17 16
##                      I 74 27 12 35 17 17 12  2 17 70  5 18  0  3  0  5  1
##                      J 29 13  8 18 25  1 34 15 38 20 10 14  7  4 28  5 18
##                      K  7  1 28  0 11 16  9  5  1 11 14 20 16  7  8  1  1
##                      L 38 15 34 11 67  0 48 17 27  1 29 25  0  5 25  0 14
##                      M  1  3  0  1  0  0  1  5  0  0  2  0 11  9  0  0  0
##                      N  0  0  0  1  0  0  0  4  0  1  0  0 26 12  1  1  1
##                      O  1  5  1  7  0  0  3 42  0  5  2 19 34 34 21  4  7
##                      P  0  5  0  5  0  8  1  2  0  4  0  0  0  2  0  5  0
##                      Q  1  8  1  0  1  0 15  2  0  1  1  5  1  2  0  1  4
##                      R  5  5  6  3  5  6  1  1  2  3  2  2  0  1  1  1  0
##                      S  0 17  1  8 11  3  3  3  1  2  2  4  0  1  3  3 11
##                      T  4  0 16 10  8 36  0  4  5  0 15  0  1 11  2 12  0
##                      U  0  0  1  3  1  0  1  3  0  0  2  1 11 12  1  0  0
##                      V  2  0  4  0  2 15  1 11  0  1 15  1  1 30  2  9 12
##                      W  0  1  0  0  0  1  2  0  1  0  4  3  2  6  1  2  3
##                      X  0  0  1  0  0  1  0  0 16  3  5  7  0  0  0  0  0
##                      Y  0  0  0  0  0  0  0  0  1  0  0  1  0  4  0  1  4
##                      Z  1  6  0  0  2  1  0  0  0  2  0  4  0  0  1  0  0
##                       
## charac_predictions_tan  R  S  T  U  V  W  X  Y  Z
##                      A 38  9  1 14  3  6 24  1  6
##                      B 41 44  2  0  1  2 19  4 26
##                      C  0  2  0  1  2  0  0  0  0
##                      D  3  0 18  0  0  0  2  5  2
##                      E  0  0  2  0  0  0  1  0  6
##                      F  1 23 44  4 36 14  5 36 19
##                      G  1  4  1  6  1  1  2  0  0
##                      H 12  0  1 13 13 15  0 14  1
##                      I 22 20  8  0  1  0 32  7 21
##                      J 23 32  3  1  0  0 34  0 20
##                      K  6  0  2 11 11  5  0  3  0
##                      L 12 16  5 11  0  0 40  0 35
##                      M  1  0  1  0  2  1  0  0  0
##                      N  0  0  0  5 19 44  0 13  0
##                      O  3  2  0 25  0  2  1  0  0
##                      P  0  0  8  0 13  0  0 20  0
##                      Q 10 11  1  5  4  5  1  8  2
##                      R  5  6  0  0  0  0  3  0  1
##                      S  2  8  8  0  2  0  4  2 12
##                      T  0  6 11 42  6  1 22 19 14
##                      U  0  0  7 14 15 16  3  2  0
##                      V  0  3 28 27 38 54  6 49  1
##                      W  1  0  1  0  6  2  0  7  0
##                      X  2  0 35  2  2  0  8 18  1
##                      Y  0  1 10  5 12 16  0  8  0
##                      Z  0  8  1  0  0  0  1  3  1
## correct_predictn_tan
## FALSE  TRUE 
##  4588   412
## correct_predictn_tan
##  FALSE   TRUE 
## 0.9176 0.0824

Based on the results obtained using four kernels i.e. vanilladot(linear), rbfdot(radial basis), tanhdot (hyperbolic tangentsigmoid) and polydot(polynomial), we see that radial basis non linear mapping gives the most accurate result.