Abstract
In this mini-project, the main task is to implement multi-scale digit image classification with given four cases of digits images files. The implemented algorithms are architectured with the fully-connected to Convolutional Neural Network and calibrated with MNIST Database for multi-Class Label classification. I develop two versions of CNN nets. The true-positive accuracy rate is 98.91% for CNNDigitTrainedNet and is 90.63% for CNNAugmented180Net. I also introduce a strategy to obtain the optimal accuracy rate by incorporating an additional rotation for the whole image batch before running rotated augmented CNN nets. This strategy has verified in both Case II and Case IV, giving a significant improvement for true-positive accuracy rates of label classification.Computer-generated Digits with/without Pre-Rotation and Handwritten Digits with/without Rotation
The main task of this mini-project is to implement multi-scale digit image recognition with the four given cases of digits images files that shown in the above figure. The implemented algorithms that I used are architectured with the fully-connected to Convolutional Neural Network which is calibrated with MNIST Database for multi-Class Label classification. In the section 2, I will explain the methodolgy of how to construct two versions of CNN nets with Matlab: CNNDigitTrainedNet and CNNAugmented180Net. Results and discussion are in section 3. Summary and conclusion are in the section 4. Section 5 is the answers of Theoretical Part.
Samples of digit images generated from the MNIST dataset: Without Rotation Augmentaion(Left); and With Rotation Augmentaion(Right)
CNNDigitTrainedNet: DigitCNNLayer
CNNAugmented180Net: DigitCNNLayer + Rotation Augmented
The third step is set the computational option for CNN nets and shows as follows:
The fourth step is to search digits position by Gaussian filtering for subsampling. Then each searched digit is fitted with 9 square boxes. Details on how to remove background image noise and inversed image from white to black; how to search digit position; and how to fit a 9-squared box for each digit can refer to my Lab-01.
The final step to classify all searched digits with two Convolutional Neural Networks :
CNNDigitTrainedNet(DigitCNNLayer)
CNNAugmented180Net(DigitCNNLayer + Rotation Augmented)
CNNDigitTrainedNet(Left) and CNNAugmented180Net(Right) trained and calibrated with MNIST
Confusion Matrix of Computer-Generated classified with CNNDigitTrainedNet(Left); and Search Digits fitted with 9 SquareBox(Right)
The pink triangle marked on the right-figure indicates the false-positive classification results with CNNDigitTrainedNet.
The computational result shows that only one false positive number is wrongly classified CNNDigitTrainedNet for the given Computer-Generated Paper. The optimal accuracy in this classification is achieved to be \(94.118\%\)
Confusion Matrix of Computer-Generated-Rotated classified with CNNAugmented180Net(Left); and Computer-Generated-Rotated Search Digits fitted with 9 SquareBox(Right)
The pink triangle marked on the right-figure indicates the false-positive classification results with CNNAugmented180Net.
CNNAugmented180Net is calibrated with data augmentation of the maximum \(360^{0}\) random rotation[-180, +180] for the MNIST database. In this experiment, I run an additional rotation procedure for the given rotated-digit images by \(15^{o}\) clockwise for all-rotated digit samples for the whole batch before running CNNAugmented180Net.
The computational results show that only three false-positive numbers are classified incorrectly with CNNAugmented180Net for the given computer-generated-pre-rotated paper. In this classification, the optimal accuracy can achieve by \(82.353\%\) if incorporating an additional rotation batch procedure for \(15^{o}\) clockwise.
Since the given rotated-digits contains some digits which already rotated but some are not. The additional rotation procedure for the whole image batch might make the non-rotated digits get not necessarily rotated (say “6”) before executing CNNAugmented180Net. It explains why non-rotated digit number 6 is classified wrongly as “9”.
Besides, it found that the given computer-generated(with/without rotated) digit number 7 is often wrongly classified as “1” with both CNNDigitTrainedNet and CNNAugmented180Net. It implies that both CNNDigitTrainedNet and CNNAugmented180Net were not well-trained with the MNIST database for label classification of the similar(non-rotated/ rotated) pixel structure such as computer-generated digit-7 and digit-1.
Confusion Matrix of Computer-Generated-Pre-Rotated: classified with CNNAugmented180Net(Left); and classified with CNNDigitTrainedNet(Right)
For next two experiments, there excludes additional rotation procedure for the given rotated-digit images by batch.That is to set the angle of rotation for the whole image batch to be \(0^{o}\).
One experiment runs with CNNAugmented180Net (left-figure) whereas the other experiment runs with CNNDigitTrainedNet (right-figure). Their confusion matrices indicate that two cases obtain low accuracy of true-positive prediction for computer-generated-pre-rotated digits images if an additional rotation procedure for the whole batch is excluded.
Hence, a strategy of incorporating an additional rotation batch procedure before running CNNAugmented180Net can obtain the optimal accuracy of true-positive prediction with pre-rotated-digits given. The optimal accuracy of true-positive prediction with the computer-generated-pre-rotated-digits given found to be \(82.353\%\) by incorporating an additional rotation batch procedure for \(15^{o}\) clockwise.
Confusion Matrix of HandWritten: classified with CNNDigitTrainedNet(Left); and HandWritten Search Digits fitted with 9 SquareBox(Right)
Confusion Matrix of HandWritten-Pre-Rotated: classified with CNNDigitTrainedNet(Left); and HandWritten-Pre-Rotated Search Digits fitted with 9 SquareBox(Right)
Confusion Matrix of HandWritten-Pre-Rotated: classified with CNNDigitTrainedNet(Left); and classified withCNNAugmented180Net (Right)
In the section, I only repeat the same procedure in Case II with both CNNDigitTrainedNet and CNNAugmented180Net for comparison. To classify the given handwritten-pre-rotated digit images, I run it with CNNAugmented180Net and CNNDigitTrainedNet separately without additional rotation procedure for the whole batch. That is, the angle of rotation is to be zero. Their confusion matrices indicate that two cases obtain low accuracy of true-positive prediction for the given handwritten-pre-rotated digit images.
The next experiment includes an additional rotation procedure for the whole image batch by anti-clockwise rotating to \(15^{o}\). The optimal accuracy for the given handwritten pre-rotated-digits estimates to be \(61.905\%\) by incorporating an additional rotation batch procedure for \(15^{o}\) anti-clockwise. Therefore, incorporating an additional rotation batch procedure before running CNNAugmented180Net is the best strategy to obtain the optimal accuracy for handwritten-pre-rotated-digits. From this experiment, I also found that the given handwritten-pre-rotated-digit-2s are hard to classify correctly with CNNDigitTrainedNet and CNNAugmented180Net.
Computer-generated Digits and Handwritten Digits with/without Pre-Rotation. Pink triangle indicates false-positive classification results
The main task of this mini-project is to implement multi-scale digit image recognition. The implemented algorithms are architectured with Convolutional Neural Network, calibrated with MNIST Database for multi-Class Label classification. I develop two versions of CNN nets: CNNDigitTrainedNet and CNNAugmented180Net. The calibrated true-positive accuracy is 98.91% for CNNDigitTrainedNet and 90.63% for CNNAugmented180Net. I used these calibrated CNN nets to classify multi-scale digit images for four cases :
Finally, I introduce a strategy to obtain the optimal true-positive classification accuracy for pre-rotated-digit image batch files by incorporating an additional rotation for the whole image batch before running rotated augmented CNN nets. I have verified this strategy in Case II and Case IV, giving a significant improvement for the true-positive accuracy rate of label classification.