Article reviewed: Gradient Based Learning Applied to Document Recognition

Reference: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

Summary: Hand written digit recognition task. CNN specifically designed to deal with 2D shapes

Five C’s:
Category:
Context:
Correctness:
Contributions:
Clarity:

Outline:

####1. Handwritten character recognition task

#####A. Learning from the data Structure risk minimization \[E_{train} + \beta H(W)\]

#####B. Gradient Based Learning easier with smooth + continuous function. \[W_k=W_{k-1}- \epsilon \frac{dE(W)}{dW}\]

#####C. Gradient Back-Propagation ⭐️ most widely used in NN! limited to linear system. No need local minimal for mutli layer NN

#####D. Learning in Real Handwriting Recognition systems Heuristic Over Segmentation: separate out characters from their neighbors. CNN saving computational cost

#####E. Globally Trainable System train the entire system to minimize the global error rather than individual letters. * Jacobian Matrix: differentiable almost everywhere

2. CNN for isolated Character Recognition

Problems: - large images, too many weights, memory issue, info loss in normalization, replication weights - CNN force extraction of local features due to repetition hidden units LeNet-5: Center the desirable potential distinctive features

Loss Function: MLE(maximum likelihood estimation criterion)= MSE (Mean Square Error)

3. Compare serval techniques

Database: Modified NIST set - Linear Classifier, Pairwise Linear Classifier - Baseline Nearest Neighbor Classifier - PCA and Polynomial Classifier - Radial Basis Function Network - Two Hidden Layer Fully Connected Multilayer NN - LeNet-1 , LeNet-4 - Tan Distance Classifier - SVM 📝 Boosting improve Accuracy

4. Isolated letter to sentences + GTN

Multi Modular System Simultaneously training letters, words, and numbers fprop bprop

5. Classical method of heuristic over segmentation

a string of text - segment the strings into individual character images. Segment graph: Directed Acyclic Graph - Recognition Transformer and Viterbi Transformer

6. Discriminative and non-discriminative gradient based techniques

  • Viterbi Training : associated with the correct label sequence has lowest penalty
  • Forward Scoring, and Forward Training

7. Space Displacement NN

Sweep the recognizer at all possible locations across a normalized image SDNN: Space Displacement Neural Network, More finite state transducer SDNN+ LeNet5 Global Training of SDNN

8. trainable GTN for multiple generalized transductions

GTN as Generalize transduction, and proposes a powerful Graph Composition Algorithm Finite values looking for arc. Hidden Markov Models

9. globally GTN in a pen pc (immediate feedback)

Network and GTN are jointly trained to minimize an error measure defined at the word level. - Segmentation transformer + character Recognition transformer+Composition transformer+ Beam Search Transformer

10. LeNet-5 : complete GTN (bank check)

use the Gradient Based Learning and GTN make this deployment fast and cost-effective while yielding an accurate and reliable solution.

Discussion:
The dataset / training set is given, there’s no real world validation

Innovations:
Combined all the techniques together

Assumptions:
English or western letters. f(a)= Atanh(Sa)

Terminology:

CNN, Gradient Based, Graph Transformer, Network, SVM, KNN