Deep learning, concepts and frameworks: Find your way through the jungle (Part 2)

Sigrid Keydana, Trivadis
02/06/2018

This is part 2, please see part 1 first :-)

Depth alone is not enough: Activation functions

Depth alone is not enough

How ever deep we make our network, if we just chain layers of matrix multiplication one after another, all we get is a linear combination of the inputs.

How can we solve non-linear problems with neural networks?

The classic: sigmoid activation function

sigmoid <- function(x) 1/(1 + exp(-x))
x <- seq(-10,10, by = 0.01)
plot(x, sigmoid(x), type = "l", xlab = "", ylab = "")

plot of chunk unnamed-chunk-1

The current default, kind of: ReLU

relu <- function(x){
  x[x<0] <- 0
  x
}
x <- seq(-10,10, by = 0.01)
plot(x, relu(x), type = "l", xlab = "", ylab = "")

plot of chunk unnamed-chunk-2

Mostly for LSTMs: tanh

x <- seq(-10,10, by = 0.01)
plot(x, tanh(x), type = "l", xlab = "", ylab = "")

plot of chunk unnamed-chunk-3

Convolutional Neural Networks for Computer Vision

Going spatial: Convnets

LeNet: First successful application of convolutional neural networks by Yann LeCun, Yoshua Bengio et al.

Source: [33]

The convolution operation (1)

Source: [24]

The convolution operation (2)

Source: [31]

Tasks in computer vision

Source: [24]

Classification

In classification, the required output is a probability for each class.

Source: [24]

Localization

In localization, the network needs to identify the position of an object in an image.

Source: [25]

Object Detection

In object detection (a.k.a. image recognition), the network has to classify and localize multiple objects in an image.

Source: [26]

Segmentation

In segmentation, the network needs to predict a class value for each input pixel.

Source: [27]

Semantic vs. Instance Segmentation

Semantic segmentation segments by class, instance segmentation by class instance.

Source: [28]

Recurrent Neural Networks in Natural Language Processing

Until now, all we've seen are static snapshots

How do we handle sequences?

language: words, sentences, paragraphs…
all kinds of serial information: sensor data, stock prices…

When Peter came home from work, dinner again wasn't ready. What the heck had Alicia been doing all day?
Frustrated, ___ opened another bottle of beer.

I need to remember: Introducing hidden state


Sources: [29], [31]

Basic RNN closeup

The basic RNN at every step combines new input and existing state.

Source: [29]

Remembering is not enough

Sometimes we also need to forget!

Two kinds of state: the LSTM "conveyor belt"

The LSTM (Long Short Term Memory) architecture adds an additional state layer, the cell state:

Source: [29]

LSTM cell state and the three gates

The LSTM cell state is protected by three gates, the forget, input, and output gates:


Source: [29]

RNN Example: Machine Translation

In translation, we have two sets of sequential data, one on the source and one on the target side!

Enter: sequence-to-sequence models

Source: [30]

Deep Learning Frameworks

TensorFlow and the dataflow programming paradigm

Source: [32]

TensorFlow demos

A simple graph for function optimization: function_optimizer_tf.ipynb
Classifying digits using low-level TensorFlow: tf_raw_mnist.ipynb
Classifying digits using the estimators API: tf_estimators_mnist.ipynb

Keras: more than just a high-level API for TensorFlow

Uses one of TensorFlow, CNTK or Theano (discontinued) for low-level operations
Also available as a high-level API for TensorFlow (tf.keras)
Like TensorFlow, also available from R

Demo:

Classifying digits with Keras: keras_mnist.ipynb

Flexibility first: PyTorch

In PyTorch, computation graphs are dynamic.Thus:

Computations may be different on different training iterations
Tensor values can be inspected directly
Debugging works just as in “normal software”

Demo:

Classifying digits with PyTorch: pytorch_mnist.ipynb

Questions?

Thanks a lot for your attention!!

Sources

[1] https://cacm.acm.org/news/221108-artificial-intelligence-pioneer-says-we-need-to-start-over

[2] Wikipedia

[3] https://crude2refined.wordpress.com/2015/08/14/my-inns-big-data-conference-review-part-1/

[4] MIT 6.S094 Deep Learning for Self-Driving Cars Lecture Slides

[5] Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

[6] Esteva et al. Dermatologist-level classification of skin cancer with deep neural networks

[7] Wikipedia. AlphaGo versus Lee Sedol

[8] Yoshihara et al. Leveraging temporal properties of news events for stock market prediction

[9] The Weather Company. Seasonal forecast

[10]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012.

Sources

[11] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.

[12] Andrej Karpathy, Li Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Description, CVPR, 2015.

[13] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.

[14] Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio, Attention-Based Models for Speech Recognition, arXiv:1506.07503 / NIPS 2015

[15] Minh-Thang Luong, Hieu Pham, and Christopher D. Manning, Effective Approaches to Attention-based Neural Machine Translation, arXiv:1508.04025

[16] Oriol Vinyals and Quoc V. Le, A Neural Conversational Model, arXiv:1506.05869

[17] Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Mohit Iyyer, Ishaan Gulrajani, and Richard Socher, Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, arXiv:1506.07285

Sources

[18] https://ajolicoeur.wordpress.com/cats/

[19] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, arXiv:1609.04802

[20] https://deepmind.com/research/alphago/alphago-china/

[21] https://deepmind.com/blog/alphago-zero-learning-scratch/a/a>

[22] http://ruder.io/optimizing-gradient-descent/

[23] https://colah.github.io/posts/2015-08-Backprop/

[24] Stanford CS231n Convolutional Neural Networks Lecture Notes

[25] Erhan et al. Scalable Object Detection using Deep Neural Networks

[26] ImageNet Large Scale Visual Recognition Challenge 2014 (ILSVRC2014)

[27] Long et al. Fully Convolutional Networks for Semantic Segmentation

Sources

[28] Silberman et al. Instance Segmentation of Indoor Scenes using a Coverage Loss

[29] Chris Olah. Understanding LSTM Networks

[30] Tensorflow seq2seq tutorial

[31] Goodfellow et al. 2016, Deep Learning

[32] TensorFlow Documentation: Graphs and Sessions

[33] LeCun et al., Gradient-based learning applied to document recognition.