An invitation to deep learning with R

class: center, middle, inverse, title-slide

# An invitation to deep learning with R
### Sigrid Keydana, RStudio
### NairobiR, 08/01/2020

---

class: middle center section

# Talk deep learning, talk data

---

# Two faces of data

---

# The promise

- Data-driven / empirical decision making (as opposed to ...)

- So many problems we could possibly solve ...

- _Deep learning_: Data + compute power + algorithms (+ ...?)

---

# The threat

- Lots of ways this can go wrong:

- enhance inequality, broaden gaps
  
  - enhance disadvantages for those already discriminated against
  
  - enforce _bias_
  
- What can be biased:

- data
  
  - algorithms
  
  - people, societies, systems ...

---

# A few starting points

(_from someone who's just starting herself_)

- [Timnit Gebru & Emily Denton, Tutorial on Fairness, Accountability, Transparency, and Ethics in Computer Vision at CVPR 2020](https://sites.google.com/view/fatecv-tutorial/schedule)

- [Ben Green, Data Science as Political Action: Grounding Data Science in a Politics of Justice.](https://arxiv.org/abs/1811.03435)

- [Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence](https://arxiv.org/abs/2007.04068)

- ... and all the "classics": Ruha Benjamin, Race after Technology; C. Ignazio & L.F. Klein, Data Feminism ...

---
# That said:

> _"it's better to know than to not know"_

- Learning about deep learning is learning

- what is possible now

- what could be possible soon
 
 - what to watch out for
 
 - what could be used to make this world better

---
class: middle center section

# What is deep learning?

---
# Situating deep learning

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/learning.png" width = 800px height=500px/>
<figcaption>Source: <a href="https://www.deeplearningbook.org/">Goodfellow et al., Deep Learning. 2016.</a></figcaption>
</figure>

---
# A neural network

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/playground.png" width = 90%/>
<figcaption><a href="https://playground.tensorflow.org/">TensorFlow playground</a></figcaption>
</figure>

---
# Representation matters

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/representation_matters.png" width = 800px height=500px/>
<figcaption>Source: <a href="https://www.deeplearningbook.org/">Goodfellow et al., Deep Learning. 2016.</a></figcaption>
</figure>

---
# Why depth

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/feature_hierarchy.png" width = 60%/>
<figcaption>Source: <a href="https://www.deeplearningbook.org/">Goodfellow et al., Deep Learning. 2016.</a></figcaption>
</figure>

---
class: middle center section

# Meet the actors

---
# Layers of neurons

Neurons are units of computation, arranged in layers. Each neuron aggregates the inputs from its incoming connections, transforms the aggregate, and passes it on to the next layer.

---
# Activations

Activations are the actual _values_ passed from one neuron to another.

my output is 22.2
  
       layer 1, neuron 1 ---------------------> layer 2, neuron 1
                            
                            my output is -0.07
       
       layer 1, neuron 1 ---------------------> layer 2, neuron 2

Often activations are the result of applying an _activation function_ to the raw aggregate of incoming values.

---
# Loss

The _loss_ is the difference between the target (ground truth) and network's output.

output neuron 1: I say 0.33 ---------> target: sorry, should be 25
 
 
 ---> absolute error is 24.67 
 ---> squared error is 608.6089
 ---> ...

---
# Weights

Weights are connection strengths _learned by_ the network.

weight = 0.77
 
 layer 1, neuron 1 ---------------------> layer 2, neuron 1
 
 weight = 0.02
 
 layer 1, neuron 1 ---------------------> layer 2, neuron 2

---
# Optimization

The process of finding good weights, based on the current loss.

Updates are computed by an _optimizer_ and applied via _backpropagation_.

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/colah.png" width = 80%/>
<figcaption>Source: <a href="https://colah.github.io/posts/2015-08-Backprop/">Chris Olah's post on backprop</a></figcaption>
</figure>

---
#  In a nutshell

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/fchollet.png" width = 60%/>
<figcaption>Source: F. Chollet & JJ Allaire, Deep Learning with R. 2018</figcaption>
</figure>

---
class: middle center section

# Deep learning with R

---
# TensorFlow / Keras - from R?

Yes!

- Website (installation, tutorials, guides ...): https://tensorflow.rstudio.com/tutorials/

- GitHub: https://github.com/rstudio/tensorflow, https://github.com/rstudio/keras

- Blog: https://blogs.rstudio.com/ai/

- Book (1st ed., 2018): F. Chollet & JJ Allaire, Deep Learning with R.

- YouTube: https://www.youtube.com/c/mlverse

---
# Install

```r
# install R packages
install.packages("tensorflow")
install.packages("keras")
install.packages("tfdatasets")

library(tensorflow)
# installs the Python backend
# by default, will create a dedicated Miniconda environment
# named r-reticulate
# if you don't have Miniconda installed, you'll be prompted whether
# reticulate may install it for you
install_tensorflow()

# what is the TensorFlow (Python) version,
# and what environment does it run in?
tensorflow::tf_config()
```

There is no separate installation of (Python) Keras required!

---
# Now let's assemble a neural network!

They all look like penguins to me ...

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/lter_penguins.png" width = 70%/>
<figcaption>Artwork by @allisonhorst</figcaption>
</figure>

---
# Data prep

```r
library(tidyverse)
library(palmerpenguins)

penguins_df <- na.omit(penguins)
penguins_df <- penguins_df %>% 
 # need numerical input, preferredly starting from 0
 # conveniently, character data are already factors
 mutate_if(is.factor, (function (x) as.numeric(x) - 1)) 
```

```r
$ species <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ island <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, …
$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, …
$ body_mass_g <int> 3750, 3800, 3250, 3450, 3650, …
$ sex <dbl> 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, …
$ year <int> 2007, 2007, 2007, 2007, 2007, …
```

---
# We have a mix of numerical and categorical predictors

Problems?

Options?

- one-hot encoding
 
 - embedding

---
# Let's use one-hot encoding here

```r
x <- penguins_df %>% 
 select(-species) %>%
 map_at("island", ~ to_categorical(.x) %>% 
 array_reshape(c(length(.x),
 length(levels(penguins$island))))) %>%
 map_at("sex", ~ to_categorical(.x) %>% 
 array_reshape(c(length(.x),
 length(levels(penguins$sex))))) %>%
 abind::abind(along = 2)

y <- penguins_df %>%
 select(species) %>% 
 pull() %>%
 to_categorical()

train_indices <- sample(1:nrow(x), 200)
x_train <- x[train_indices, ]
x_val <- x[-train_indices, ]
y_train <- y[train_indices, ]
y_val <- y[-train_indices, ]
```

---
# Define a network

Excursion: Three types of architectures in Keras

- sequential API:`keras_model_sequential()`

- functional API: `keras_model()`

- custom models: `keras_model_custom()`

---
# We need: A model

The model definition basically is a listing of layers.

```r
model <- keras_model_sequential() %>%
 # a fully connected layer
 layer_dense(
 units = 32, # this layer has 32 "neurons"
 activation = "relu" # activation function is relu
 ) %>%
 # a stochastic ("noise") layer
 layer_dropout(0.2) %>%
 layer_dense(
 units = 32,
 activation = "relu"
 ) %>%
 layer_dropout(0.2) %>%
 # output layer has 3 units, for 3 classes
 layer_dense(units = 3, activation = "softmax")
```

---
# We need: A loss function

Keras `compile` says how to optimize that model.

```r
model %>% compile(
* loss = "categorical_crossentropy",
  optimizer = optimizer_rmsprop(),
  metrics = "accuracy"
)
```

---
# We need: An optimizer

```r
model %>% compile(
  loss = "categorical_crossentropy", 
* optimizer = optimizer_rmsprop(),
  metrics = "accuracy"
)
```

(And here, you also tell Keras which metrics to monitor.)

---
# Ready to train

We basically always want to monitor progress on a validation set.
(This dataset is pretty small though.)

```r
history <- model %>% fit(
 x_train,
 y_train,
 validation_data = list(x_val, y_val),
 epochs = 100,
 batch_size = 8
)
```

---
# So that was classification ...

Ready for something else? Enter _Generative Adversarial Networks_ (GANs):

<figure>
<img src = "https://raw.githubusercontent.com/skeydan/invitation_to_deep_learning_in_R/master/gan.png" width = 90%/>
<figcaption>Image source: https://arxiv.org/pdf/1710.07035.pdf</figcaption>
</figure>

---
# Ready for something else?

(Yeah, I asked already.)

---
class: middle center section

# Can you keep a secret?

---
# Please allow me to introduce myself ...

---
# The task: Generate digits (not - quite - MNIST)

[Kuzushiji MNIST](https://github.com/rois-codh/kmnist)

---
# Get the data

In `torch`, we always work with `dataset`s and `dataloaders` ...

```r
# Kuzushiji MNIST comes with torch
kmnist <- kmnist_dataset(
 dir,
 download = TRUE,
 transform = function(x) {
 x <- x$to(dtype = torch_float())/256
 x <- 2*(x - 0.5)
 x[newaxis,..]
 }
)

# create a dataloader
dl <- dataloader(kmnist, batch_size = batch_size, shuffle = TRUE)
```

---
# Define the network ... (1/2)

```r
generator <- nn_module(
 "generator",
 initialize = function() {
 self$main = nn_sequential(
 nn_conv_transpose2d(latent_input_size, image_size * 4, 4, 1, 0, bias = FALSE),
 nn_batch_norm2d(image_size * 4),
 nn_relu(),
 nn_conv_transpose2d(image_size * 4, image_size * 2, 4, 2, 1, bias = FALSE),
 nn_batch_norm2d(image_size * 2),
 nn_relu(),
 nn_conv_transpose2d(image_size * 2, image_size, 4, 2, 2, bias = FALSE),
 nn_batch_norm2d(image_size),
 nn_relu(),
 nn_conv_transpose2d(image_size, 1, 4, 2, 1, bias = FALSE),
 nn_tanh()
 )
 },
 forward = function(x) {
 self$main(x)
 }
)
```

---
# Define the network ... (2/2)

```r
discriminator <- nn_module(
 "discriminator",
 initialize = function() {
 self$main = nn_sequential(
 nn_conv2d(1, image_size, 4, 2, 1, bias = FALSE),
 nn_leaky_relu(0.2, inplace = TRUE),
 nn_conv2d(image_size, image_size * 2, 4, 2, 1, bias = FALSE),
 nn_batch_norm2d(image_size * 2),
 nn_leaky_relu(0.2, inplace = TRUE),
 nn_conv2d(image_size * 2, image_size * 4, 4, 2, 1, bias = FALSE),
 nn_batch_norm2d(image_size * 4),
 nn_leaky_relu(0.2, inplace = TRUE),
 nn_conv2d(image_size * 4, 1, 4, 2, 1, bias = FALSE),
 nn_sigmoid()
 )
 },
 forward = function(x) {
 self$main(x)
 }
)
```

---
# What else? A loss function ...

```r
criterion <- nn_bce_loss()
```

Binary cross-entropy. Used twice:

- The discriminator has to decide, for every input it gets, if it was real or generated. _That decision could be right or wrong_.

- The generator is judged by whether the discriminator's verdict on its creations is right or wrong. _Right is bad; wrong is good_.

---
# ... and an optimizer

Two, actually: Each module has their own.

```r
learning_rate <- 0.0002

# this optimizer takes care of the discriminator's weights
disc_optimizer <- optim_adam(disc$parameters, lr = learning_rate)

# this optimizer takes care of the generator's weights
gen_optimizer <- optim_adam(gen$parameters, lr = learning_rate)
```

---
# And train!

```r
for (epoch in 1:num_epochs) {
 for (b in enumerate(dl)) {
 y_real <- torch_ones(b[[1]]$size()[1], device = device)
 y_fake <- torch_zeros(b[[1]]$size()[1], device = device)
 noise <- torch_randn(b[[1]]$size()[1], latent_input_size, 1, 1, device = device)
 fake <- gen(noise)
 img <- b[[1]]$to(device = device)

# update discriminator
 disc_loss <- criterion(disc(img), y_real) + criterion(disc(fake$detach()), y_fake)
 disc_optimizer$zero_grad()
 disc_loss$backward()
 disc_optimizer$step()

# update generator
 gen_loss <- criterion(disc(fake), y_real)
 gen_optimizer$zero_grad()
 gen_loss$backward()
 gen_optimizer$step()
 }
}
```

---
# Monitoring generator skill over time ...

---
class: middle center section

# What now?

---
# Practice ...?

Yeah, and

_Play around ...?_

Yeah, and

_Have fun ...?_

Yeah, and

---
# Participate!

https://github.com/mlverse/torch _[... and a whole ecosystem to build around `torch` core ...]_

https://github.com/rstudio/tensorflow,
https://github.com/rstudio/keras,
https://github.com/rstudio/tfdatasets,
https://github.com/rstudio/tfruns,
https://github.com/rstudio/cloudml 
_[... and more ...]_

https://github.com/mlverse _[a place for model implementations, infrastructure helpers, ... what have you]_

---
class: black

# Thank you so much for listening!