Neural Networks

Elements of Statistical Learning: Chapter 11

Brynjólfur Gauti Jónsson, MS student statistics

This Lecture

How to get pretty good

How to get really good

Geoffrey Hinton

Geoffrey Hinton

Study Material

From Linear Models to Neurons

The multilayer perceptron

\[ y = \varphi(\sum w_i x_i + b) = \begin{cases} 1, \hspace{5mm} \sum w_ix_i > -b \\ 0, \hspace{5mm} \sum w_ix_i \leq -b \end{cases} \]

What we call MLP today is not literally a perceptron, since it doesn’t use the non-linearity above.

From neurons: The MLP

Image from Computer Age Statistical Inference by Efron and Hastie

Image from Computer Age Statistical Inference by Efron and Hastie

Forward propagation

We have data \(X\) and a network with \(L\) layers. Each layer, \(\ell\) has \(n_\ell\) hidden nodes, so the weight matrix \(W^{(\ell)}\) has dimension \(n_{\ell - 1} \times n_{\ell}\), that is it takes as input \(n_{\ell - 1}\) columns and outputs \(n_{\ell}\) columns.

Set \(A^{(0)} = X\) and we get for each \(\ell\) in \(1:L\)

\[ \begin{aligned} Z^{(\ell)} &= W^{(\ell)}A^{(\ell - 1)} + b^{(\ell)} \\ A^{(\ell)} &= f^{(\ell)}(Z^{(\ell)}) \end{aligned} \]

In particular

\[ \hat Y = A^{(L)} \]

Backpropagation

Having obtained our predictions, our error for each observation is

\[ \mathcal L = (y_i - \hat y_i)^2 = (y_i - A^{(L)})^2 \]

For each weight parameter we need \(\frac{\partial \mathcal L}{\partial W^{\ell}}\). To do this we use the chain rule:

\[ \begin{aligned} dZ^{(\ell)} &= dA^{(\ell)} f'(Z^{(\ell)}) \\ dW^{(\ell)} &= dZ^{(\ell)} A^{(\ell - 1)T} \\ db^{(\ell)} &= \text{rowSums}(dZ^{(\ell)}) \\ dA^{(\ell - 1)} &= W^{(\ell)}dZ^{(\ell)} \end{aligned} \]

To see why this is true:

\[ \begin{aligned} \frac{\partial \mathcal L}{\partial W^{(L)}} &= \frac{\partial \mathcal L}{\partial Z^{(L)}}\frac{\partial Z^{(L)}}{\partial W^{(L)}} \\ &= \frac{\partial \mathcal L}{\partial A^{(L)}}\frac{\partial A^{(L)}}{\partial Z^{(L)}}\frac{\partial Z^{(L)}}{\partial W^{(L)}} \\ &= \left[2(y_i - A^{(L)}) * f^{(L)'}(Z)\right]W^{(L)T} \\ \frac{\partial \mathcal L}{\partial A^{(L - 1)}} &= \frac{\partial \mathcal L}{\partial A^{(L)}}\frac{\partial A^{(L)}}{\partial Z^{(L)}}\frac{\partial Z^{(L)}}{\partial A^{(L - 1)}} \end{aligned} \]

Fitting MLPs

Adam

Both images from Deep Learning, Goodfellow, Bengio, Courville

Both images from Deep Learning, Goodfellow, Bengio, Courville

ConvNets

Convolutional Neural Networks use filters which they slide across the image to detect features.

Image from Computer Age Statistical Inference by Efron and Hastie

Image from Computer Age Statistical Inference by Efron and Hastie

Convolution and filters

\[ F_h = \begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix} * A \\ F_v = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{bmatrix} * A \\ F = \sqrt{F_h^2 + F_v^2} \]

Instead of using handmade filters, ConvNets learn the parameters of the filters.

“The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.” - Geoffrey Hinton

Sequence Models

\[ y_t = f(y_{t-1}, x_t) \]

Image from Deep Learning with R by Abhijit Ghatak

Image from Deep Learning with R by Abhijit Ghatak

Long Short Term Memory

\[ \begin{aligned} \Gamma_u &= \sigma(W_u[a^{(t - 1)}, x^{(t)}] + b_u) \\ \Gamma_f &= \sigma(W_f[a^{(t - 1)}, x^{(t)}] + b_f) \\ \Gamma_o &= \sigma(W_o[a^{(t - 1)}, x^{(t)}] + b_o) \\ \tilde c^{(t)} &= tanh(W_c[a^{(t - 1)}, x^{(t)}] + b_c)\\ c^{(t)} &= \Gamma_u * \tilde c^{(t)} + \Gamma_f * \tilde c^{(t - 1)}\\ a^{(t)} &= \Gamma_o * tanh(c^{(t)}) \end{aligned} \]

Music Generation

Music generation

Autoencoders

Encoder-Decoder

Encoder-Decoder

Image Encoding

Denoising

Can turn it into a denoising algorithm by applying gradually increasing gaussian noise to inputs!

Clustering

Can use clustering algorithms to find similar pictures in this latent space.

What is the average of two faces?

We can linearly interpolate two faces in the latent space.

Generative Models

Steps:

You must make a friend of horror

Image to caption