Deep Learning in Action

Action!

Sources: [1],[3],[4],[5],[6]

So what is a neural network?

Biological neuron and artificial neuron

Source: [10]

Prototype of a neuron: the perceptron (Rosenblatt, 1958)

Source: [10]

Deep neural networks: introducing hidden layers

Source: [10]

Deep neural networks as function composition

 

A deep representation is a composition of many functions

 

\( x \xrightarrow{w_1} h_1 \xrightarrow{w_2} h_2 \xrightarrow{w_3} ... \xrightarrow{w_n} h_n \xrightarrow{w_{n+1}} y \)

Why go deep? A bit of background

 

Easy? Difficult?

  • walk
  • talk
  • play chess
  • solve matrix computations

Easy for us - difficult for computers

 

  • controlled movement
  • speech recognition
  • speech generation
  • object recognition and object localization

Representation matters

Source: [12]

Just feed the network the right features?

 

What are the correct pixel values for a “bike” feature?

  • race bike, mountain bike, e-bike?
  • pixels in the shadow may be much darker
  • what if bike is mostly obscured by rider standing in front?

Let the network pick the features

… a layer at a time

Source: [12]

So how does a network learn?

 

Just a sec - let's meet a real neural network first!

Play around in the browser:

So how DOES a neural network learn?

 

We need:

  • a way to quantify our current (e.g., classification) error
  • a way to reduce error on subsequent iterations
  • a way to propagate our improvement logic from the output layer all the way back through the network!

Quantifying error: Loss functions

 

The loss (or cost) function indicates the cost incurred from false prediction / misclassification

Probably the best-known loss function in machine learning is mean squared error:

\( \frac{1}{n} \sum_n{(\hat{y} - y)^2} \)

Most of the time, in deep learning we use cross entropy:

\( - \sum_j{t_j log(y_j)} \)

This is the negative log probability of the right answer.

Learning from errors: Gradient Descent

Source: [12]

Propagate back errors ... well: Backpropagation!

 

  • basically, just the chain rule: \( \frac{dz}{dx} = \frac{dz}{dy} \frac{dy}{dx} \)
  • chained over several layers:
Source: [14]

Forward pass and backward pass: Intuition

  • imagine output of \( f = (x + y) * z = -12 \) “wants” to get bigger
  • this could happen by \( q \) getting smaller -> \( q \) receives negative gradient \( \frac{df}{dq} = -4 \)
  • \( q \) just passes on this gradient to \( x \) and \( y \), as \( \frac{dq}{dx} = 1 \) and \( \frac{dq}{dy} = 1 \)
  • alternatively, it could happen by \( z \) getting bigger -> \( z \) receives positive gradient \( \frac{df}{dz} = 3 \)
Source: [13]

Applications by example

 

  • CNNs (Convolutional Neural Networks) for computer vision
  • RNNs (Recurrent Neural Networks) for Natural Language Processing
  • Deep Reinforcement Learning for real-life learning

Easy vs. hard, revisited

 

VISION

Why computer vision is hard

Source: [15]

Tasks in computer vision

Source: [13]

Classification

 

In classification, the required output is a probability for each class.

Source: [13]

Localization

 

In localization, the network needs to identify the position of an object in an image.

Source: [17]

Object Detection

 

In object detection (a.k.a. image recognition), the network has to classify and localize multiple objects in an image.

Source: [18]

Segmentation

 

In segmentation, the network needs to predict a class value for each input pixel.

Source: [19]

Semantic vs. Instance Segmentation

 

Semantic segmentation segments by class, instance segmentation by class instance.

Source: [20]

How do we identify the required features? Enter:

 

Convolutional Neural Networks

A convolutional neural network

 

Source: [13]

The convolution operation

 

(Strictly, this is cross-correlation, but it doesn't matter)

Source: [13]

Gimp demo

 

Blur: \( \begin{bmatrix}1 & 1 & 1\\1 & 1 & 1\\1 & 1 & 1\end{bmatrix} \), sharpen: \( \begin{bmatrix}0 & -1 & 0\\-1 & 5 & -1\\0 & -1 & 0\end{bmatrix} \), edge detect: \( \begin{bmatrix}0 & 1 & 0\\1 & -4 & 1\\0 & 1 & 0\end{bmatrix} \)

see: https://docs.gimp.org/en/plug-in-convmatrix.html

Easy vs. hard, revisited

 

VISION

LANGUAGE

Until now, all we've seen are static snapshots

 

How do we handle sequences

  • language: words, sentences, paragraphs…
  • all kinds of serial information: sensor data, stock prices…

?

 

Jane walked into the room. John walked in too. It was late in the day, and everyone was walking home after a long day at work. Jane said hi to ___

Source: [21]

How do we remember the past? Enter:

 

Recurrent neural networks

Hidden state

 

Sources: [22], [12]

Basic RNN closeup

 

The basic RNN at every step combines new input and existing state.

Source: [22]

Remembering is not enough

 

Sometimes we also need to forget!

Two kinds of state: the LSTM "conveyor belt"

 

The LSTM (Long Term Short Memory) architecture adds an additional state layer, the cell state

Source: [22]

LSTM cell state and the three gates

 

The LSTM cell state is protected by three gates, the forget, input, and output gates:

Source: [22]

RNN Example: Machine Translation

 

In translation, we have two sets of sequential data, one on the source and one on the target side!

Enter: sequence-to-sequence models

Source: [23]

Real-life seq-2-seq: Google's Neural Machine Translation System

 

Source: [24]

Easy vs. hard, revisited

 

VISION

LANGUAGE

VISION <-> LANGUAGE

Generating image captions

 

Source: [26]

Easy vs. hard, revisited

 

VISION

LANGUAGE

VISION <-> LANGUAGE

VISION <-> SOUND

Linking video and sound: adding audio to silent film

Source: [27]

Easy vs. hard, revisited

 

VISION

LANGUAGE

VISION <-> LANGUAGE

VISION <-> SOUND

LIFE (SORT OF)

Life

 

Reinforcement learning

Source: [1]

Reinforcement learning: the task

 

Source: [1]

Reinforcement learning: The problem

 

If I get a reward many many actions later…

… how do I find out what concrete action I'm getting the reward for?

Reinforcement learning: The dilemma

 

EXPLOITATION vs. EXPLORATION

How to learn from delayed rewards: Q-Learning

 

Learn to maximize future (discounted) reward:

Source: [1]

The quest for real intelligence

 

  • supervised learning: reasoning by memorization
  • pure reinforcement learning: reasoning by trial-and-error
  • can we get less brute-force here?

 

 

“Reinforcement learning + deep learning = AI” (David Silver, Google Deep Mind)

Deep Q-Learning @AlphaGo

 

Source: [29]

Deep Learning, where to go next?

 

For structured reading:

Just wanna have some cool fun?

Thanks for your attention!

Sources (1)

Sources (2)

Sources (3)

Sources (4)