Hey all!

RStudio recently released an interface to the Keras API. This is pretty cool, since it enables people to use R as interface for the state of the art tensorflow/ CUDNN toolchain. They already implemented tensorflow last year. It’s basically connecting to Keras via the reticulate package, which is an interface to python. I’m really happy that this is done by RStudio and in particular J.J. Allaire, which pretty much ensures high quality by default. So far I used Keras in python, but I will definitely give this implementation a try! You will find detailed information on their github.io page.

Installation

First you have to install the Keras package from github with the devtools package. If you don’t already have the devtools package you can just install it from CRAN. The Keras package comes with a function called install_tensorflow(). Since I had to troubleshoot some stuff during the installation I looked into this function. It’s basically checking which OS and python distribution you are using and depending on that it creates a virtual environment called r-tensorflow. This was not working for me due to some proxy problems, which I could not resolve. But I think I found an even better way: I already had a conda environment for tensorflow which is working with GPU support. I just cloned this one to r-tensorflow. And voilà: After manually installing some missing packages I had a working keras/ tensorflow toolchain with GPU support in R. Nice! :-)

#Get devtools
install.packages("devtools")

#Install Keras
devtools::install_github("rstudio/keras")

#Set up python environment
keras::install_tensorflow()

Get your hands dirty!

Okay now let’s have some hello world fun! Here comes an adapted MNIST CNN example from their github site. First we’re setting some parameters and fetching the data. Because of the proxy issue I first downloaded the MNIST data here.

library(keras)

batch_size <- 128
num_classes <- 10
epochs <- 10

# define input shape of pics (28 x 28 px in b/w)
input_shape <- c(28, 28, 1)

# load data
mnist <- dataset_mnist("D:/DATA/MNIST/mnist.npz")

The data comes already split into test and train set, so all we have to do is bring it in shape and normalize the black and white values to [0,1]. Also we’re one hot encoding the label to get y vectors of the correct dimension.

#train data
x_train <- array(
     as.numeric(mnist$train$x), 
     dim = c(dim(mnist$train$x)[[1]], 
             input_shape[1], 
             input_shape[2], 
             1)
     )/255

y_train <- to_categorical(mnist$train$y, num_classes)

#test data
x_test <- array(
     as.numeric(mnist$test$x), 
     dim = c(dim(mnist$test$x)[[1]], 
             input_shape[1], 
             input_shape[2], 
             1)
     )/255

y_test <- to_categorical(mnist$test$y, num_classes)

As you can see the train data array has 60k samples of 28*28 pixel matrices with one channel. We have 60k train images and 10k test images.

cat('Shape of x_train:', dim(x_train), '\n')
## Shape of x_train: 60000 28 28 1
cat(dim(x_train)[[1]], 'train samples\n')
## 60000 train samples
cat(dim(x_test)[[1]], 'test samples\n')
## 10000 test samples

Now we can define the model. This part works pretty much the same like in python, the parameters are all the same. Only the syntax is different: The so called pipe operator is used here. There’s no magic about that model %>% layer_conv_2d(...) is the same like layer_conv_2d(model, ...). The idea of using pipelines was introduced (to R) by the magrittr package. It’s something that’s not really common in general purpose programming. IMHO piping is extremely useful in creating readable and reproducible data workflows. But that’s a whole different topic :-D. In python the model would have the method .add for adding layers. R comes in a more functional fashion here!

model <- keras_model_sequential()
model %>%
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu', 
                input_shape = input_shape) %>% 
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_dropout(rate = 0.2) %>% 
  layer_flatten() %>% 
  layer_dense(units = 128, activation = 'relu') %>% 
  layer_dropout(rate = 0.5) %>% 
  layer_dense(units = num_classes, activation = 'softmax')

Okay. So we created a CNN with Keras in R. Nothing fancy, just one convolutional layer with 32 filters, 3*3 convolution and a rectified linear unit as activation followed by a 2*2 max pooling and a 20% dropout for regularization. For classification of our features we use a dense layer with 128 neurons, rectified linear units and add 50% dropout chance. Last but not least the softmax layer - of course without dropout ;-). Just like you maybe know the model.summary() method in python you can plot a summary in R. Functional approach of course :-D :

summary(model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_1 (Conv2D)                (None, 26, 26, 32)            320         
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)   (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## flatten_1 (Flatten)              (None, 5408)                  0           
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 128)                   692352      
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 10)                    1290        
## ===========================================================================
## Total params: 693,962
## Trainable params: 693,962
## Non-trainable params: 0
## ___________________________________________________________________________

Compiling the model works as you would expect it. For the sake of testing we’re using adam with standard values, they can be changed in the parameters of the optimizer_adam() function.

model %>% compile(
  loss = loss_categorical_crossentropy,
  optimizer = optimizer_adam(),
  metrics = c('accuracy')
)

So far so good. Let’s try 10 epochs! (And have a look at the stop watch)

Start <- Sys.time()

history <- model %>% fit(
  x_train, y_train,
  batch_size = batch_size,
  epochs = epochs,
  verbose = 2,
  validation_data = list(x_test, y_test)
)

cat("Wall time was:", "\n")
## Wall time was:
Sys.time() - Start
## Time difference of 1.157592 mins

About 7 seconds per epoch on my QUADRO K2200. Not quite a Deep Learning work horse, but also not bad! Let’s have a look at the training progress:

plot(history)

Works like a charm! We did not tune something, but nevertheless let’s check our validation hold out:

scores <- model %>% evaluate(
  x_test, y_test, verbose = 0
)

cat('Test loss:', scores[[1]], '\n', 
    'Test accuracy:', scores[[2]], '\n')
## Test loss: 2.294259 
##  Test accuracy: 0.1175

Well at least some years ago that would be worth a cake :-D

Why would someone do that?

Well, good question. If you really think about it, there’s no difference in using R or python. Both are just extremely high level interfaces to a very complex toolchain down to your CUDA enabled device. For me creating the model in python is not a big deal, but when it comes to data manipulation I have data munging super powers in R compared to python. If you feel the same: Have fun with the new Keras for R package :-) And btw: You can also use the sequential API, for example: model$add(Dense(units = 128, activation = 'relu')).