Beyond Hello World, A Computer Vision Example: The R version

Hello! This is the second code walkthrough of the session Machine Learning Foundations where the awesome Laurence Moroney,a Developer Advocate at Google working on Artificial Intelligence, takes you through the fundamentals of building machine learning models using TensorFlow.

In this episode, Episode 2, Laurence Moroney takes us through yet another exciting application of Machine Learning. Here, we go beyond “Hello World” and start applying the fundamental patterns of building a neural network to more sophisticated scenarios, beginning with computer vision–how a computer can learn to “see.”

Like the previous R Notebook, this Notebook tries to replicate the Python Notebook used for this episode.

Before we begin, I highly recommend(99.99%) that you follow Episode 2 where Laurence Moroney demystifies some important concepts in computer vision and how to implement a NN on the same. I will try and highlight some of the stuff Laurence Moroney said and add some of my own for the sake of completeness but I highly recommend you listen from him first.

What is Computer Vision?

Simply put, it is the field of taking pixels and recognizing what’s in them. To train a NN to recognize the contents of images, we need data and we need labelled images. So where do we get these? Don’t fret, we’ll use the Fashion MNIST, an easily available dataset that comes with the Keras package.

Fashion MNIST dataset contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution 28 by 28 pixels (hence we can train a NN easily with them), as seen here:

Loading the required libraries and exploring the data

Let’s start by loading the libraries required for this session.

We’ll be requiring some packages(ggplot and tidyr) in the Tidyverse and Keras(a framework for defining a neural network as a set of Sequential layers). You can have them installed as follows

suppressMessages(install.packages("tidyverse"))
suppressMessages(install.packages("keras"))
suppressMessages(install_keras())

Ps: it could take a while

Once installed, let’s get rolling:

suppressPackageStartupMessages({
  library(tidyverse)
  library(keras)
  library(plotly)
})

We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from Keras.

mnist <- dataset_fashion_mnist()

c(training_images, training_labels) %<-% mnist$train
c(test_images, test_labels) %<-% mnist$test

# the train_images and train_labels arrays are the training set
# (the data the model uses to learn). 


# The model is tested against the test set: 
# the test_images, and test_labels arrays.

%<-% is known as the Multiple assignment operator. To explain it in simple terms, consider the code below:

c(a,b) %<-% c(1,2)
print(a)

## [1] 1

print(b)

## [1] 2

# a is assigned to 1 while b is assigned to 2 in just a single line of code, neat, right?

A little sanity check on our data

dim(training_images)

## [1] 60000    28    28

dim(training_labels)

## [1] 60000

# The above shows there are 60,000 images in the training set,
# with each image represented as 28 x 28 pixels and 60,000
# labels for each image

dim(test_images)

## [1] 10000    28    28

dim(test_labels)

## [1] 10000

# The above shows there are 10,000 images in the training set,
# with each image represented as 28 x 28 pixels and 10,000 
# labels for each image


# let's see the different categories of clothing in the labels
unique(test_labels)

##  [1] 9 2 1 6 4 5 7 3 8 0

# Each label is an integer between 0 and 9
#  Typing: ?dataset_fashion_mnist will show you the corresponding clothe.
# eg: 0 - T-shirt/top, 1 - Trouser, 9 - Ankle boot

label_names = c('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',  'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')

The corresponding integer labels and label name are as:

digit	label_names
0	T-shirt/top
1	Trouser
2	Pullover
3	Dress
4	Coat
5	Sandal
6	Shirt
7	Sneaker
8	Bag
9	Ankle boot

Normalizing the Data

Before going any further, let’s inspect the first image.

im_1 <- as.data.frame(training_images[1, ,]) 
# this takes the first image and its corresponding pixels and converts the data into a data frame(28 rows by 28 columns),
# a format commonly used by functions such as ggplot

colnames(im_1) <- seq_len(ncol(im_1))
# changes the column names from the default v1:v28 to 1:28

im_1 <- im_1 %>% mutate(y = seq_len(nrow(im_1)))
# adds a new column y with numbers running from 1 to 28. This will be the y axis in our plot

im_1 <- im_1 %>% gather(key = "x", value = "pixel_value", -y) %>%
          mutate(x = as.integer(x))
View(im_1)
# transforms the data from a "wide" to a "long" format such that
# every corresponding x and y axis integer has a pixel value assigned to it.

# let's get to plotting

im_1 <- im_1 %>%
  ggplot(aes(x = x, y = y, fill= pixel_value))+
  geom_tile()+
  scale_fill_gradient(low = "white", high = "black", na.value = NA)+
  scale_y_reverse()+
  theme_light()+
  theme(panel.grid = element_blank())+
  labs(x="", y="", fill = "pixel_value")

ggplotly(im_1)

# To understand what each "sub-plotting function does, 
# you could first run the code untill 'geom_tile' 
# then go on adding the rest of the functions sequentially 
 
# Experiment with different indices in the training_images array.
# For example, also take a look at index 42...that's a 
# different boot than the one at index 0

The website for the R interface to Keras contains a neatly done example of displaying the first 25 images of the training set and their corresponding class name.

To get a better grasp of the Tidyverse, my any time, any day recommendation is the R for Data Science book by Hadley Wickham and Garret Grolemund

If you inspect the first image in the training set (hover over it), you will see that the pixel values fall in the range of 0 to 255. If we are training a neural network, for various reasons(eg speeding up learning) it’s easier if we treat all values as between 0 and 1, a process called ‘normalizing’. For this, we simply divide by 255 for both the training set and the test set.

training_images <- training_images / 255
test_images <- test_images / 255

Building the model

To build the model, we will use the fundamental steps learnt in Episode one, that is:

Setting up the layers for our NN.
Making the network ready for training by compiling it.

Setting up the layers

# keras_model_sequential {keras}    creates a Keras Model composed of a linear stack of layers
model <- keras_model_sequential()

Let’s then add layers

model %>%
  layer_flatten(input_shape = c(28, 28)) %>%
  layer_dense(units = 128, activation = 'relu') %>% 
  # to help in the exercises below
  # if you wanted to add another hidden layer, say consisting
  # of 256 neurons
  
  #layer_dense(units = 256, activation= 'relu') %>% 
  layer_dense(units = 10, activation = 'softmax')

Our NN consists of three layers. In our first layer, layer_flatten does the task of reformatting the data from a 2d-array (of 28 by 28 pixels), to a 1-d array of 28 * 28 = 784 pixels.

The data is then passed to our second layer (a hidden layer), which consists of 128 neurons. layer_dense adds a layer of neurons.

Each layer of neurons need an activation function to tell them what to do. There’s lots of options, but just use these for now.

relu (rectified linear unit) effectively means “If X>0 return X, else return 0” – so what it does it it only passes values 0 or greater to the next layer in the network.

The last layer, which is the output layer, consists of 10 neurons with each node returning a score that indicates the probability that the current image belongs to one of the 10 digit classes.

softmax turn logits (numeric output of the last linear layer of a multi-class classification neural network) into probabilities that add up to one.

Compile: Configure a Keras model for training

To make the network ready for training, we need to pick three more things, as part of the step: compilation

A loss function—How the network will be able to measure how good a job it’s doing on its training data, and thus how it will be able to steer itself in the right direction.
An optimizer—The mechanism through which the network will update itself based on the data it sees and its loss function.
Metrics to monitor during training and testing—Here we’ll only care about accuracy

model %>%
  compile(
   loss = 'sparse_categorical_crossentropy',
   optimizer = optimizer_adam(),
   metrics = c('accuracy')
  )

Training the Neural Network

This is the process of training the neural network, where it ‘learns’ the relationship between the train_images and train_labels arrays.

To start training, call the fit method — the model is “fit” to the training data for a fixed number of epochs.

history <- model %>% 
  fit(x = training_images,
      y = training_labels,
      epochs = 5)

history

## Trained on 60,000 samples (batch_size=32, epochs=5)
## Final epoch (plot to see history):
## loss: 0.2921
##  acc: 0.8922

Once it’s done training – you should see an accuracy value at the end of the final epoch. It might look something like 0.8918. This tells you that your neural network is about 88% accurate in classifying the training data. I.E., it figured out a pattern match between the image and the labels that worked 88% of the time. Not great, but not bad considering it was only trained for 5 epochs and done quite quickly.

Evaluating the model

This is the step where we evaluate how accurately the network learnt to classify the images.

metrics <- model %>%
            evaluate(x = test_images,
                     y= test_labels)

cat("Test loss", metrics$loss, "\n")

## Test loss 0.3427567

cat("Test accuracy", metrics$acc, "\n")

## Test accuracy 0.8788

For me, it returned an accuracy of about 0.8771 which is a little less than the accuracy on the training dataset. As you go through Lauren Moroney’s sessions, you’ll look at ways to improve this.

Making predictions using the model

With the model trained, we can use it to make predictions about some images.

predictions <- model %>% predict(test_images)

The model has just predicted a label for each image. Let’s see the predictions for the first image

predictions[1,]

##  [1] 1.102592e-06 1.673532e-08 2.155870e-08 3.702560e-09 5.950899e-07
##  [6] 2.787934e-03 3.041885e-06 2.538547e-02 2.148213e-06 9.718196e-01

# Huh? Why are there 10 values? Hold that thought.

For each image, a prediction of an array of 10 numbers is made. These numbers are a probability that the image being classified is the corresponding image. So which label had the highest probability?

which.max(predictions[1,])

## [1] 10

# Aha! As the labels are 0-based (start from 0-9),
# this actually means a that the highest probability was assigned to label 9.
# Let's confirm this by checking at the first label in the
# `test_labels` array

test_labels[1]

## [1] 9

# Voila!! The actual label was 9.

# if we wanted to obtain only the predicted classes
# without the probabilities:

pred_class <- model %>% predict_classes(test_images)

pred_class[1] # To see the predicted class of the first image

## [1] 9

You have come this far. You Rock!! We have seen how to classify items of clothing from fashion MNIST.

After going through the below exercises, you will be well primed to attempt the Exercise that comes with this episode.

If you have any questions, get them to Laurence by dropping them in this Episode’s comments.

Exploration exercises

The below exercises are from the Python Notebook used for this episode.

To explore further, try the below exercises:

Exercise 1

For this first exercise run the below code: It creates a set of classifications for each of the test images, and then prints the first entry in the classifications. The output, after you run it is a list of numbers. Why do you think this is, and what do those numbers represent?

classifications <- model %>% predict(test_images)

Here, the model has predicted the label for each image in the testing set. Let’s take a look at the first prediction:

classifications[1,]

##  [1] 1.102592e-06 1.673532e-08 2.155870e-08 3.702560e-09 5.950899e-07
##  [6] 2.787934e-03 3.041885e-06 2.538547e-02 2.148213e-06 9.718196e-01

# another hint
print(test_labels[1]) # and you'll get a 9. Does that help you understand why this list looks the way it does?

## [1] 9

# Alternatively, we can also directly get the class prediction:
class_pred <- model %>% predict_classes(test_images)
class_pred[1:10]

##  [1] 9 2 1 1 6 1 4 6 5 7

What does this list represent?

It’s 10 random meaningless values
It’s the first 10 classifications that the computer made
It’s the probability that this item is each of the 10 classes

Answer The correct answer is (3)

For each image, a prediction of an array of 10 numbers is made. These numbers are a probability that the image being classified is the corresponding image.

How do you know that this list tells you that the item is an ankle boot?

There’s not enough information to answer that question
The 10th element on the list is the biggest, and the ankle boot is labelled 9
The ankle boot is label 9, and there are 0->9 elements in the list

hint

which.max(classifications[1,])

## [1] 10

# we can see that the tenth element has the maximum confidence value

Answer

The correct answer is (2). Both the list and the labels are 0 based, so the ankle boot having label 9 means that it is the 10th of the 10 classes. The list having the 10th element being the highest value means that the Neural Network has predicted that the item it is classifying is most likely an ankle boot

Exercise 2

Let’s now look at the layers in your model. Experiment with different values for the dense layer with 512 neurons. What different results do you get for loss, training time etc? Why do you think that’s the case?

Hint To increase the number of neurons in our hidden layer, we change units = 512 from the previous 128. Everything else remains the same

Question 1. Increase to 1024 Neurons – What’s the impact?

Training takes longer, but is more accurate
Training takes longer, but no impact on accuracy
Training takes the same time, but is more accurate

Answer

The correct answer is (1) by adding more Neurons we have to do more calculations, slowing down the process, but in this case they have a good impact – we do get more accurate. That doesn’t mean it’s always a case of ‘more is better’, you can hit the law of diminishing returns very quickly!

Exercise 3

What would happen if you remove the Flatten() layer. Why do you think that’s the case?

Answer

You get an error about the shape of the data. It may seem vague right now, but it reinforces the rule of thumb that the first layer in your network should be the same shape as your data. Right now our data is 28x28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to ‘flatten’ that 28,28 into a 784x1. Instead of wriitng all the code to handle that ourselves, we add the layer_flatten() layer at the begining, and when the arrays are loaded into the model later, they’ll automatically be flattened for us.

Exercise 4

Consider the final (output) layers. Why are there 10 of them? What would happen if you had a different amount than 10? For example, try training the network with 5

Answer

You get an error as soon as it finds an unexpected value. Another rule of thumb – the number of neurons in the last layer should match the number of classes you are classifying for. In this case it’s the digits 0-9, so there are 10 of them, hence you should have 10 neurons in your final layer.

Exercise 5

Consider the effects of additional layers in the network. What will happen if you add another layer between the one with 512 and the final layer with 10.

Answer

There isn’t a significant impact – because this is relatively simple data. For far more complex data (including color images to be classified as flowers that you’ll see in the next lesson), extra layers are often necessary.

Exercise 6

Consider the impact of training for more or less epochs. Why do you think that would be the case?

Try 15 epochs – you’ll probably get a model with a much better loss than the one with 5 Try 30 epochs – you might see the loss value stops decreasing, and sometimes increases. This is a side effect of something called ‘overfitting’ which you can learn about [somewhere] and it’s something you need to keep an eye out for when training neural networks. There’s no point in wasting your time training if you aren’t improving your loss, right! :)

Exercise 7

Before you trained, you normalized the data, going from values that were 0-255 to values that were 0-1. What would be the impact of removing that? Comment out:

training_images <- training_images / 255
test_images <- test_images / 255 give it a try

then give it a try. Why do you think you get different results?

Exercise 8: Keras callbacks

Earlier when you trained for extra epochs you had an issue where your loss might change. It might have taken a bit of time for you to wait for the training to do that, and you might have thought ‘wouldn’t it be nice if I could stop the training when I reach a desired value?’ – i.e. 95% accuracy might be enough for you, and if you reach that after 3 epochs, why sit around waiting for it to finish a lot more epochs….So how would you fix that? Like any other program…you have callbacks! Let’s see them in action…

Keras callbacks

A callback is an object that is passed to the model in the call to and that is called by the fit model at various points during training. It has access to all the data available about the state of the model and its performance, and it can take action: interrupt training, save a model, load a different weight set, or otherwise alter the state of the model.

Keras includes a number of built-in callbacks. For this exercise, we will build our own callback which stops the model from training once a desired accuracy is attained.

Custom Callbacks

You can create a custom callback by creating a new R6 class that inherits from the KerasCallback class.

library(R6)

library(R6)

# define custom callback class
train_stop <- R6::R6Class("train_stop",
 inherit = KerasCallback,
 public = list(
   on_epoch_end = function(epoch, logs = list()){
     if(logs$'acc'>0.9){
       self$model$stop_training = TRUE
       paste0("Reached 90% accuracy so cancelling training!")
     }
   }
                            
                          ))

That done, we just have to create an instance of the callback and attach the callback to model training as below:

# loading the required packages
suppressPackageStartupMessages({
  library(tidyverse)
  library(keras)
  library(plotly)
})

# importing the datasets directly from Keras
mnist <- dataset_fashion_mnist()
c(training_images, training_labels) %<-% mnist$train
c(test_images, test_labels) %<-% mnist$test

# Normalizing the pixel values to fall in the range between 0 and 1
training_images <- training_images / 255
test_images <- test_images / 255


# creates an instance of the callback
callback <- train_stop$new()

# Setting up the layers
model <- keras_model_sequential()
model %>%
  layer_flatten(input_shape = c(28, 28)) %>%
  layer_dense(units = 128, activation = 'relu') %>% 
  layer_dense(units = 10, activation = 'softmax')

# Compile the model
model %>%
  compile(
   loss = 'sparse_categorical_crossentropy',
   optimizer = optimizer_adam(),
   metrics = c('accuracy')
  )

# Training the NN
model %>% 
  fit(x = training_images,
      y = training_labels,
      epochs = 5,
      # Attach the callback to model training
      callbacks = list(callback)
      )

# Evaluating the model on test data
metrics <- model %>%
            evaluate(x = test_images,
                     y= test_labels)

cat("Test loss", metrics$loss, "\n")
cat("Test accuracy", metrics$acc, "\n")

# Making predictions
classifications <- model %>% predict(test_images)
classifications[1,]
print(test_labels[1])

Reference Material

Machine Learning Foundations: Ep #2 - First steps in computer vision
Deep Learning with R by Francois Chollet and J.J.Allaire
The R interface to Keras website.
Copy of Lab2- Computer-Vision.ipynb
Exericise 2 for this episode-Question.ipynb