Neural Network Case Study : Recognize Hand-Written Digits (Multilayer Perceptron):
Neuron : Some # [ 0,1 ] = “Activation”
Each Pixel is a gray-scale value of the letter
The first image makes up the first layer of the network
Hidden Layers : Layers in between
The last layer is the level of confidence for each of the possible numbers : 0 - 9
Why should we expect a layered structure to behave intelligently?
When we look at numbers (0,1,2,3,4,5,6,7,8,9), there are characteristics which we recognize, which allow us to group similar looking letters and then pick which is which based on more specific attributes.
So maybe we have a neuron which likes (activates) when viewing a long straight line. Therefore knowing it can’t be round letters like 0, 3 or 6.
From each layer we have a broken down image to broader patterns
Weighing Branches :
Suppose \(n_i\) represents each neuron for \(i\) indexes \(\implies\) \[ \sum w_i n_i = \text{Weighted Sum} \]
Then we normalize it : \(f(\sum w_i n_i) \in [0,1]\)
Where commonly, that function is a sigmoid function :
\[ f(x) = \sigma(x) = \frac{1}{1+e^{-x}} \]
Where as you can tell has a domain from [0,1] (logistic Curve).
By running this function ( \(\sigma(x)\) ) onto our weighted sum, we measure how positive the weighted sum is ( how many white-pixels ).
Suppose however, we only want it to be active for value beyond 10, we would then adjust with bias.
\[ \sum w_in_i - 10 = \sum w_in_i - \text{bias} \implies \sigma(\sum w_in_i - \text{bias}) \in [0,1] \]
Every neuron is connected to every previous neuron. Where each has its specific bias.
When we talk about the “neural network learning”, what we are talking about is it picking specific weights & biases s.t. it will recognize patterns and solve the problem at hand.
\[ \text{Let }n_i \text{ represent ACTIVATION values} \implies \\ \begin{bmatrix}w_{11} ... w_{1n} \\ .\\ .\\ .\\ w_{n1}...w_{nm} \end{bmatrix} * \vec{n}_i + \vec{b}_{ias}= W^{\text{n x m}}* \vec{n}_i^{\text{m x 1}}+ \vec{b}_{ias} = \vec{W}_{sum} \]
Where \(\vec{n}_i\) represents the activation value for a specific layer ( \(i\) ) and \(W\) indicates the weights; each row of \(W\) indicates the weighted connections to a specific neuron in the next layer.
Then, last but not least, we need to run the sigmoid function to each row (Neuron) to normalize it.
\[ \sigma(\vec{W}_{sum}) = \sigma(W_{eight}* \vec{n}_{euron}+ \vec{b}_{ias}) \]