Introduction

The objective of this work is to create a model that can identify a number written by hand and recognize which is the number it represents. In order to tran the model I used the Mnist Dataset, more information is here: http://yann.lecun.com/exdb/mnist/.

The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

The model I used to recognize caracters is a prototype of a Digit Recognizer using a specific Fourier Transform, called Walsh Hadamard Transform, a connected component transform, some Kpi’s from each training image, principal component analysis and a feed forward Neural Nework with a hidden Layer that taes all the inputs and trains the weights of its connections to improve the predicted numbers.

In the following chunks I’ll describe step by step the process.

Binarization of the input

Each image in the dataset is a bunch of pixels joint together in a matrix. This matrix can be represented by 28 rows to 28 columns of pixels. As the trainging pictures are black and white, each pixel is a number that goes from 0 to 255, it could be said that each pixel is in the grey space. In order to be able to work with the data we need to binarize the input, in other words, take those values from 0 to 255 and decide pixel by pixel if it is a 0 or a 1.

In this two set of images we can see and example of how the binarization takes place. In the upper images the raw data, and in the second set of pizels the binarized data.

Original Images

Binarized Images

As we can see I had to choose a threshold for every pixel in order to be a 1 or a 0. After several tryouts, the choosen number was 100 out of 255. This number gave me a good start to binarizate each image without a major information loss.

KPI Calculation

In the following lines I will explain all the inputs used for the Neural Network. Is important to clarify that the neural network cannot take as an input the raw images, we need to constract a set of key performance indicators, that differenciate each image from the rest and be different enough not to duplicate information.

Sums per row and per column

In these 2 kpi’s we generate per image (hich as 28 pixels by 28 pixels), the sum of pixels per column and per row. As an Example we can see the sum of columns and rows for an example. In the axes, the sum of rows and the sum of columns is represented, transformed into a vector of colors for better comprehension. When the color is closer to the white, the sum for that specific column or row is bigger, when the color is darker, the sum is lower.

Dual Images

In the following 2 KPI’s we take each number and we duplicate the a part of it, like making a mirror. This Kpi will help us identify characters which its first half is the same as the second half. For example, the 0 and the 8 in both directions or the 3 in the vertical mirror.

As an example we see the original picture and the dual picture of a character 3.

With this KPI’s already generated we are ready now to move into the Neural Network parameter generation process.

Parameters Generation

1: H30

For this parameter we generate the sum of pixels for the first 30% of rows of each number.

For this image the Row sum H30 is 34.

2. H50

For this parameter we generate the sum of pixels for the first 50% of rows of each number.

For this image the Row sum H50 is 48.

3. H80

For this parameter we generate the sum of pixels for the first 80% of rows of each number.

For this image the Row sum H80 is 117.

4. V30

For this parameter we generate the sum of pixels for the first 30% of columns of each number.

For this image the column sum V80 is 11.

5. V50

For this parameter we generate the sum of pixels for the first 50% of columns of each number.

For this image the column sum V50 is 46.

6. V80

For this parameter we generate the sum of pixels for the first 80% of columns of each number.

For this image the column sum V80 is 69.

7. Horizontal Similarity

Correlation between an input character sample ‘I’ with its horizontal mirror ‘X’. It should be noted that ‘X’ is generated from ‘I’ where lower half of ‘X’ is same as that of ‘I’ and upper half of ‘X’ is the mirror image of its lower half.

For example, here we plot 2 binarized imageS and its correspondant mirrorS. For the image in the top the correlation is 0.73 and for the image in the botton the correlation is 0.67.

8. Vertical Similarity

Correlation between an input character sample (I) and its Vertical mirror (X). It should be noted that ‘X’ is generated from ‘I’ where left half of ‘X’ is same as that of ‘I’ and right half of ‘X’ is the mirror image of its left half.

For example, here we plot 2 binarized images and its correspondant mirrors. For the image in the top the correlation is 0.75 and for the image in the botton the correlation is 0.5.

##9.Walsh-Hadamard transform##

WHT DataBase Generation

In order to be able to use the Walsh-Hadamard Transformation we need to compare each image with defailt number images. Because of this I created 10 default number images and generated the Walsh-Hadamard Transformation to each of them.

WHT Compare

I need to compare de correlation of the denoted image with the WHT DataBase. For example for a given image, I show the image and the 10 correlations to each of the default characters.

For this image the correlation to each character is:

zero one two three four five six seven eight nine
0.41 0.13 0.32 0.22 0.3 0.28 0.44 0.14 0.29 0.27

As we can see the WHT get a good approximation to the number we are looking for. In tihs case it puts the zero as a second choice, but very close to the first place, where the number 6 is.

10. Connected Component Transform

It gives a measure of the number of closed areas in a character. In an image, all the connected pixels are given same labels. Thus if there are two sets of such pixels, as in ‘8’, they would be labeled as ‘1’ and ‘2’. Thus the highest value of ‘label’ gives an idea of closed areas present in the character.

We use the connected component transform (Rosenfeld and Pfalz, 1966) and we show its result for 10 numbers in little black letters in the top of each image.

Which is actually the numbers of components inside the numbers.

11.Principal Component Analysis

We have 28x28 pixels per image. Most of them are 0 for each image. So we can say that most of them give us no information regarding which number weneed to predict. We will enhage a Principal Component Analysis to try to figure out which are the principal Components of the dataset that absorb the maximum variance. After this we will add this prdictors to the Nnet Input.

We apply PCA to the coviariance matrix and check how many components should we keep.

Acording to the plot we keep the first 15 components, which represent almost 90% of the variance, to fit the model.

train_score <- as.matrix(training) %*% train_pc$rotation[,1:15]

Final Data Set

Data: Dataset of all the calculated kpi’s per photo.

We show the names of the variables of the Data set.

In the following graphs we can see a density plot per varable taking into account the number it represents.

Train and Test

In this step we scale all the variables and we center them so that there is no weight difference between them, so that all variables have the same initial importance. This is a key step for the neural network training to succeed. We also create train and test partitions of the data.

I show a couple of rows to see the final Data set in action.

Neural Network

In this step we train the Neural network.

For this specific training set, I’ve choosen a neural network with 1 hidden layer with 22 nodes. This election was done after testing the nnet with a several diferent numbers of nodes for the hidden layer. I’ve tried with 10 nodes, to 28 and the best accuracy for the model was obteined with 24 nodes.

Model Results:

my.grid6 <- expand.grid(.decay = c(0.5), .size = c(22))
train_control <- trainControl(method="repeatedcv", number=10, repeats=3)
numbers.fit6 <- train(label ~ ., data = numbers.traintotal,method = "nnet", maxit = 1000, trContor=train_control,tuneGrid = my.grid6, trace = F, linout = 1) 

Plot Neural Network

Neural Network 

42000 samples
   34 predictor
   10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 42000, 42000, 42000, 42000, 42000, 42000, ... 
Resampling results:

  Accuracy   Kappa    
  0.9566663  0.9518345

Tuning parameter 'size' was held constant at a value of 22
Tuning parameter 'decay'
 was held constant at a value of 0.5
 

Test Set Predictions table

For the test set we have a 97.1 % prediction accuracy.

Here we show the confusion matrix for the test set.

Good preditions

I’ll like to show some images of good predictions:

Bad preditions

And also some pictures of bad predicted caracters with the predicted character in little black font.

As we can see in the previuos graph, the badly predicted characters are not easy to distinguish and even for a person its hard to recognize.

