1 Introduction

Computer-generated Digits with/without Pre-Rotation and Handwritten Digits with/without Rotation

Computer-generated Digits with/without Pre-Rotation and Handwritten Digits with/without Rotation

The main task of this mini-project is to implement multi-scale digit image recognition with the four given cases of digits images files that shown in the above figure. The implemented algorithms that I used are architectured with the fully-connected to Convolutional Neural Network which is calibrated with MNIST Database for multi-Class Label classification. In the section 2, I will explain the methodolgy of how to construct two versions of CNN nets with Matlab: CNNDigitTrainedNet and CNNAugmented180Net. Results and discussion are in section 3. Summary and conclusion are in the section 4. Section 5 is the answers of Theoretical Part.

2 Methodology

  • The first step is to download the MNIST Database for calibrating my Convolutional Neural Network for multi-Class Label classification. The MNIST dataset entails 70,000 handwritten digits, pre-partitioned into training sets and testing sets of 60,000 and 10,000 images respectively, and available from http://yann.lecun.com/exdb/mnist/. Each digit image in the MNIST dataset is well-resized as \(28 × 28 × 1\) pixel with the label class for each image in pairs. The below figure shows the digit images generated from the MNIST dataset.
 Samples of digit images generated from the MNIST dataset: Without Rotation Augmentaion(Left); and With Rotation Augmentaion(Right)

Samples of digit images generated from the MNIST dataset: Without Rotation Augmentaion(Left); and With Rotation Augmentaion(Right)

  • The second step is to construct two versions of Convolutional Neural Networks. The first version(named “CNNDigitTrainedNet”) includes a CNN layer of “DigitCNNLayer” excluding data augmentation. The second version(named “CNNAugmented180Net”) includes “DigitCNNLayer” + “Rotation Augmented” for calibrating the rotated digit image . The architecture of “DigitCNNLayer” is integrated with a fully connected layer at the end of Convolutional Neural Networks for label classification with 10-neurons. The augmented version of my CNN nets to study the label classification with the possibility of a maximum \(360^{o}\) rotation feature in digit images.
CNNDigitTrainedNet: DigitCNNLayer

CNNDigitTrainedNet: DigitCNNLayer

CNNAugmented180Net: DigitCNNLayer + Rotation Augmented

CNNAugmented180Net: DigitCNNLayer + Rotation Augmented

  • The third step is set the computational option for CNN nets and shows as follows:

    • In this computational CNN nets, I choose to use stochastic gradient descent with a momentum = 0.9 that performs parameter updating for each iterative training, fast converging to globally minimum, and avoiding getting stuck at local minima.
  • The fourth step is to search digits position by Gaussian filtering for subsampling. Then each searched digit is fitted with 9 square boxes. Details on how to remove background image noise and inversed image from white to black; how to search digit position; and how to fit a 9-squared box for each digit can refer to my Lab-01.

  • The final step to classify all searched digits with two Convolutional Neural Networks :

      1. CNNDigitTrainedNet(DigitCNNLayer); and
      1. CNNAugmented180Net(DigitCNNLayer + Rotation Augmented) respectively.

3 Results and Discussion

3.1 Train and calibirate Two versions of CNN Nets:

3.1.1 CNNDigitTrainedNet(DigitCNNLayer)

CNNDigitTrainedNet(DigitCNNLayer)

CNNDigitTrainedNet(DigitCNNLayer)

  • The accurary of CNNDigitTrainedNet(DigitCNNLayer) is 98.91%.
  • The loss of CNNAugmented180Net(DigitCNNLayer + Rotation Augmented is 0.0250 %.

3.1.2 CNNAugmented180Net(DigitCNNLayer + Rotation Augmented)

CNNAugmented180Net(DigitCNNLayer + Rotation Augmented)

CNNAugmented180Net(DigitCNNLayer + Rotation Augmented)

  • The accurary of CNNAugmented180Net(DigitCNNLayer + Rotation Augmented is 90.63%.
  • The loss of CNNAugmented180Net(DigitCNNLayer + Rotation Augmented is 0.0950 %.

3.2 Confusion Matrix: Well-Calibrated for CNNDigitTrainedNet and CNNAugmented180Net

CNNDigitTrainedNet(Left) and   CNNAugmented180Net(Right) trained and calibrated with MNIST

CNNDigitTrainedNet(Left) and CNNAugmented180Net(Right) trained and calibrated with MNIST

  • The confusion matrix shows three important informations:
    • The on the diagonal indicates the True-Positive Classification Numbers.
    • The on the off-diagonal indicates the False-Positive Classification Numbers.
    • The Acurracy of the Confusion Matrix defines as :

3.3 Case I: Computer-Generated Digits Images

Confusion Matrix of Computer-Generated classified with CNNDigitTrainedNet(Left); and  Search Digits fitted with 9 SquareBox(Right)

Confusion Matrix of Computer-Generated classified with CNNDigitTrainedNet(Left); and Search Digits fitted with 9 SquareBox(Right)

  • The pink triangle marked on the right-figure indicates the false-positive classification results with CNNDigitTrainedNet.

  • The computational result shows that only one false positive number is wrongly classified CNNDigitTrainedNet for the given Computer-Generated Paper. The optimal accuracy in this classification is achieved to be \(94.118\%\)

3.4 Case II: Computer-Generated-Pre-Rotated Digits Images

Confusion Matrix of Computer-Generated-Rotated  classified with CNNAugmented180Net(Left); and Computer-Generated-Rotated Search Digits fitted with 9 SquareBox(Right)

Confusion Matrix of Computer-Generated-Rotated classified with CNNAugmented180Net(Left); and Computer-Generated-Rotated Search Digits fitted with 9 SquareBox(Right)

  • The pink triangle marked on the right-figure indicates the false-positive classification results with CNNAugmented180Net.

  • CNNAugmented180Net is calibrated with data augmentation of the maximum \(360^{0}\) random rotation[-180, +180] for the MNIST database. In this experiment, I run an additional rotation procedure for the given rotated-digit images by \(15^{o}\) clockwise for all-rotated digit samples for the whole batch before running CNNAugmented180Net.

  • The computational results show that only three false-positive numbers are classified incorrectly with CNNAugmented180Net for the given computer-generated-pre-rotated paper. In this classification, the optimal accuracy can achieve by \(82.353\%\) if incorporating an additional rotation batch procedure for \(15^{o}\) clockwise.

  • Since the given rotated-digits contains some digits which already rotated but some are not. The additional rotation procedure for the whole image batch might make the non-rotated digits get not necessarily rotated (say “6”) before executing CNNAugmented180Net. It explains why non-rotated digit number 6 is classified wrongly as “9”.

  • Besides, it found that the given computer-generated(with/without rotated) digit number 7 is often wrongly classified as “1” with both CNNDigitTrainedNet and CNNAugmented180Net. It implies that both CNNDigitTrainedNet and CNNAugmented180Net were not well-trained with the MNIST database for label classification of the similar(non-rotated/ rotated) pixel structure such as computer-generated digit-7 and digit-1.

Confusion Matrix of Computer-Generated-Pre-Rotated:  classified with CNNAugmented180Net(Left); and classified with CNNDigitTrainedNet(Right)

Confusion Matrix of Computer-Generated-Pre-Rotated: classified with CNNAugmented180Net(Left); and classified with CNNDigitTrainedNet(Right)

  • For next two experiments, there excludes additional rotation procedure for the given rotated-digit images by batch.That is to set the angle of rotation for the whole image batch to be \(0^{o}\).

  • One experiment runs with CNNAugmented180Net (left-figure) whereas the other experiment runs with CNNDigitTrainedNet (right-figure). Their confusion matrices indicate that two cases obtain low accuracy of true-positive prediction for computer-generated-pre-rotated digits images if an additional rotation procedure for the whole batch is excluded.

  • Hence, a strategy of incorporating an additional rotation batch procedure before running CNNAugmented180Net can obtain the optimal accuracy of true-positive prediction with pre-rotated-digits given. The optimal accuracy of true-positive prediction with the computer-generated-pre-rotated-digits given found to be \(82.353\%\) by incorporating an additional rotation batch procedure for \(15^{o}\) clockwise.

3.5 Case III: Handwritten Digits Images

Confusion Matrix of HandWritten:  classified with CNNDigitTrainedNet(Left); and HandWritten Search Digits fitted with 9 SquareBox(Right)

Confusion Matrix of HandWritten: classified with CNNDigitTrainedNet(Left); and HandWritten Search Digits fitted with 9 SquareBox(Right)

  • In this section, I only run the same procedure in the Case I with CNNDigitTrainedNet. The true-positive result of the handwritten digit images found to be \(80\%\).The result also indicates that CNNDigitTrainedNet cannot well-classify handwritten digit-2. I wonder whether the given handwritten digit-2 might exist a strange pixel pattern that MNIST database not contained. Besides, if two digits are too closed, the 9 square boxes fitted to the searched digit images may overlap with the same 9-square boxes. It is a crucial problem, causing wrong digit classification.

3.6 Case IV: Handwritten-Pre-Rotated Digits Images :

Confusion Matrix of HandWritten-Pre-Rotated:  classified with CNNDigitTrainedNet(Left); and HandWritten-Pre-Rotated Search Digits fitted with 9 SquareBox(Right)

Confusion Matrix of HandWritten-Pre-Rotated: classified with CNNDigitTrainedNet(Left); and HandWritten-Pre-Rotated Search Digits fitted with 9 SquareBox(Right)

Confusion Matrix of HandWritten-Pre-Rotated:  classified with CNNDigitTrainedNet(Left); and classified withCNNAugmented180Net (Right)

Confusion Matrix of HandWritten-Pre-Rotated: classified with CNNDigitTrainedNet(Left); and classified withCNNAugmented180Net (Right)

  • In the section, I only repeat the same procedure in Case II with both CNNDigitTrainedNet and CNNAugmented180Net for comparison. To classify the given handwritten-pre-rotated digit images, I run it with CNNAugmented180Net and CNNDigitTrainedNet separately without additional rotation procedure for the whole batch. That is, the angle of rotation is to be zero. Their confusion matrices indicate that two cases obtain low accuracy of true-positive prediction for the given handwritten-pre-rotated digit images.

  • The next experiment includes an additional rotation procedure for the whole image batch by anti-clockwise rotating to \(15^{o}\). The optimal accuracy for the given handwritten pre-rotated-digits estimates to be \(61.905\%\) by incorporating an additional rotation batch procedure for \(15^{o}\) anti-clockwise. Therefore, incorporating an additional rotation batch procedure before running CNNAugmented180Net is the best strategy to obtain the optimal accuracy for handwritten-pre-rotated-digits. From this experiment, I also found that the given handwritten-pre-rotated-digit-2s are hard to classify correctly with CNNDigitTrainedNet and CNNAugmented180Net.

4 Summary and Conclusion

Computer-generated Digits and Handwritten Digits with/without Pre-Rotation. Pink triangle indicates false-positive classification results

Computer-generated Digits and Handwritten Digits with/without Pre-Rotation. Pink triangle indicates false-positive classification results

The main task of this mini-project is to implement multi-scale digit image recognition. The implemented algorithms are architectured with Convolutional Neural Network, calibrated with MNIST Database for multi-Class Label classification. I develop two versions of CNN nets: CNNDigitTrainedNet and CNNAugmented180Net. The calibrated true-positive accuracy is 98.91% for CNNDigitTrainedNet and 90.63% for CNNAugmented180Net. I used these calibrated CNN nets to classify multi-scale digit images for four cases :

Finally, I introduce a strategy to obtain the optimal true-positive classification accuracy for pre-rotated-digit image batch files by incorporating an additional rotation for the whole image batch before running rotated augmented CNN nets. I have verified this strategy in Case II and Case IV, giving a significant improvement for the true-positive accuracy rate of label classification.