Beyond Hello World, A Computer Vision Example: The R version

Hello! This is the second code walkthrough of the session Machine Learning Foundations where the awesome Laurence Moroney,a Developer Advocate at Google working on Artificial Intelligence, takes you through the fundamentals of building machine learning models using TensorFlow.

In this episode, Episode 2, Laurence Moroney takes us through yet another exciting application of Machine Learning. Here, we go beyond “Hello World” and start applying the fundamental patterns of building a neural network to more sophisticated scenarios, beginning with computer vision–how a computer can learn to “see.”

Like the previous R Notebook, this Notebook tries to replicate the Python Notebook used for this episode.

Before we begin, I highly recommend(99.99%) that you follow Episode 2 where Laurence Moroney demystifies some important concepts in computer vision and how to implement a NN on the same. I will try and highlight some of the stuff Laurence Moroney said and add some of my own for the sake of completeness but I highly recommend you listen from him first.

What is Computer Vision?

Simply put, it is the field of taking pixels and recognizing what’s in them. To train a NN to recognize the contents of images, we need data and we need labelled images. So where do we get these? Don’t fret, we’ll use the Fashion MNIST, an easily available dataset that comes with the Keras package.

Fashion MNIST dataset contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution 28 by 28 pixels (hence we can train a NN easily with them), as seen here:

Loading the required libraries and exploring the data

Let’s start by loading the libraries required for this session.

We’ll be requiring some packages(ggplot and tidyr) in the Tidyverse and Keras(a framework for defining a neural network as a set of Sequential layers). You can have them installed as follows

suppressMessages(install.packages("tidyverse"))
suppressMessages(install.packages("keras"))
suppressMessages(install_keras())

Ps: it could take a while

Once installed, let’s get rolling:

We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from Keras.

%<-% is known as the Multiple assignment operator. To explain it in simple terms, consider the code below:

## [1] 1
## [1] 2

A little sanity check on our data

## [1] 60000    28    28
## [1] 60000
## [1] 10000    28    28
## [1] 10000
##  [1] 9 2 1 6 4 5 7 3 8 0

The corresponding integer labels and label name are as:

digit label_names
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

Normalizing the Data

Before going any further, let’s inspect the first image.

The website for the R interface to Keras contains a neatly done example of displaying the first 25 images of the training set and their corresponding class name.

To get a better grasp of the Tidyverse, my any time, any day recommendation is the R for Data Science book by Hadley Wickham and Garret Grolemund

If you inspect the first image in the training set (hover over it), you will see that the pixel values fall in the range of 0 to 255. If we are training a neural network, for various reasons(eg speeding up learning) it’s easier if we treat all values as between 0 and 1, a process called ‘normalizing’. For this, we simply divide by 255 for both the training set and the test set.

Building the model

To build the model, we will use the fundamental steps learnt in Episode one, that is:

  • Setting up the layers for our NN.
  • Making the network ready for training by compiling it.

Setting up the layers

Let’s then add layers

Our NN consists of three layers. In our first layer, layer_flatten does the task of reformatting the data from a 2d-array (of 28 by 28 pixels), to a 1-d array of 28 * 28 = 784 pixels.

The data is then passed to our second layer (a hidden layer), which consists of 128 neurons. layer_dense adds a layer of neurons.

Each layer of neurons need an activation function to tell them what to do. There’s lots of options, but just use these for now.

relu (rectified linear unit) effectively means “If X>0 return X, else return 0” – so what it does it it only passes values 0 or greater to the next layer in the network.

The last layer, which is the output layer, consists of 10 neurons with each node returning a score that indicates the probability that the current image belongs to one of the 10 digit classes.

softmax turn logits (numeric output of the last linear layer of a multi-class classification neural network) into probabilities that add up to one.

Compile: Configure a Keras model for training

To make the network ready for training, we need to pick three more things, as part of the step: compilation

  • A loss function—How the network will be able to measure how good a job it’s doing on its training data, and thus how it will be able to steer itself in the right direction.

  • An optimizer—The mechanism through which the network will update itself based on the data it sees and its loss function.

  • Metrics to monitor during training and testing—Here we’ll only care about accuracy

Training the Neural Network

This is the process of training the neural network, where it ‘learns’ the relationship between the train_images and train_labels arrays.

To start training, call the fit method — the model is “fit” to the training data for a fixed number of epochs.

## Trained on 60,000 samples (batch_size=32, epochs=5)
## Final epoch (plot to see history):
## loss: 0.2921
##  acc: 0.8922

Once it’s done training – you should see an accuracy value at the end of the final epoch. It might look something like 0.8918. This tells you that your neural network is about 88% accurate in classifying the training data. I.E., it figured out a pattern match between the image and the labels that worked 88% of the time. Not great, but not bad considering it was only trained for 5 epochs and done quite quickly.

Evaluating the model

This is the step where we evaluate how accurately the network learnt to classify the images.

## Test loss 0.3427567
## Test accuracy 0.8788

For me, it returned an accuracy of about 0.8771 which is a little less than the accuracy on the training dataset. As you go through Lauren Moroney’s sessions, you’ll look at ways to improve this.

Making predictions using the model

With the model trained, we can use it to make predictions about some images.

The model has just predicted a label for each image. Let’s see the predictions for the first image

##  [1] 1.102592e-06 1.673532e-08 2.155870e-08 3.702560e-09 5.950899e-07
##  [6] 2.787934e-03 3.041885e-06 2.538547e-02 2.148213e-06 9.718196e-01

For each image, a prediction of an array of 10 numbers is made. These numbers are a probability that the image being classified is the corresponding image. So which label had the highest probability?

## [1] 10
## [1] 9
## [1] 9

You have come this far. You Rock!! We have seen how to classify items of clothing from fashion MNIST.

After going through the below exercises, you will be well primed to attempt the Exercise that comes with this episode.

If you have any questions, get them to Laurence by dropping them in this Episode’s comments.

Exploration exercises

The below exercises are from the Python Notebook used for this episode.

To explore further, try the below exercises:

Exercise 1

For this first exercise run the below code: It creates a set of classifications for each of the test images, and then prints the first entry in the classifications. The output, after you run it is a list of numbers. Why do you think this is, and what do those numbers represent?

Here, the model has predicted the label for each image in the testing set. Let’s take a look at the first prediction:

##  [1] 1.102592e-06 1.673532e-08 2.155870e-08 3.702560e-09 5.950899e-07
##  [6] 2.787934e-03 3.041885e-06 2.538547e-02 2.148213e-06 9.718196e-01
## [1] 9
##  [1] 9 2 1 1 6 1 4 6 5 7

What does this list represent?

  1. It’s 10 random meaningless values
  2. It’s the first 10 classifications that the computer made
  3. It’s the probability that this item is each of the 10 classes

Answer The correct answer is (3)

For each image, a prediction of an array of 10 numbers is made. These numbers are a probability that the image being classified is the corresponding image.

How do you know that this list tells you that the item is an ankle boot?

  1. There’s not enough information to answer that question
  2. The 10th element on the list is the biggest, and the ankle boot is labelled 9
  3. The ankle boot is label 9, and there are 0->9 elements in the list

hint

## [1] 10

Answer

The correct answer is (2). Both the list and the labels are 0 based, so the ankle boot having label 9 means that it is the 10th of the 10 classes. The list having the 10th element being the highest value means that the Neural Network has predicted that the item it is classifying is most likely an ankle boot

Exercise 2

Let’s now look at the layers in your model. Experiment with different values for the dense layer with 512 neurons. What different results do you get for loss, training time etc? Why do you think that’s the case?

Hint To increase the number of neurons in our hidden layer, we change units = 512 from the previous 128. Everything else remains the same

Question 1. Increase to 1024 Neurons – What’s the impact?

  1. Training takes longer, but is more accurate
  2. Training takes longer, but no impact on accuracy
  3. Training takes the same time, but is more accurate

Answer

The correct answer is (1) by adding more Neurons we have to do more calculations, slowing down the process, but in this case they have a good impact – we do get more accurate. That doesn’t mean it’s always a case of ‘more is better’, you can hit the law of diminishing returns very quickly!

Exercise 3

What would happen if you remove the Flatten() layer. Why do you think that’s the case?

Answer

You get an error about the shape of the data. It may seem vague right now, but it reinforces the rule of thumb that the first layer in your network should be the same shape as your data. Right now our data is 28x28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to ‘flatten’ that 28,28 into a 784x1. Instead of wriitng all the code to handle that ourselves, we add the layer_flatten() layer at the begining, and when the arrays are loaded into the model later, they’ll automatically be flattened for us.

Exercise 4

Consider the final (output) layers. Why are there 10 of them? What would happen if you had a different amount than 10? For example, try training the network with 5

Answer

You get an error as soon as it finds an unexpected value. Another rule of thumb – the number of neurons in the last layer should match the number of classes you are classifying for. In this case it’s the digits 0-9, so there are 10 of them, hence you should have 10 neurons in your final layer.

Exercise 5

Consider the effects of additional layers in the network. What will happen if you add another layer between the one with 512 and the final layer with 10.

Answer

There isn’t a significant impact – because this is relatively simple data. For far more complex data (including color images to be classified as flowers that you’ll see in the next lesson), extra layers are often necessary.

Exercise 6

Consider the impact of training for more or less epochs. Why do you think that would be the case?

Try 15 epochs – you’ll probably get a model with a much better loss than the one with 5 Try 30 epochs – you might see the loss value stops decreasing, and sometimes increases. This is a side effect of something called ‘overfitting’ which you can learn about [somewhere] and it’s something you need to keep an eye out for when training neural networks. There’s no point in wasting your time training if you aren’t improving your loss, right! :)

Exercise 7

Before you trained, you normalized the data, going from values that were 0-255 to values that were 0-1. What would be the impact of removing that? Comment out:

training_images <- training_images / 255
test_images <- test_images / 255 give it a try

then give it a try. Why do you think you get different results?

Exercise 8: Keras callbacks

Earlier when you trained for extra epochs you had an issue where your loss might change. It might have taken a bit of time for you to wait for the training to do that, and you might have thought ‘wouldn’t it be nice if I could stop the training when I reach a desired value?’ – i.e. 95% accuracy might be enough for you, and if you reach that after 3 epochs, why sit around waiting for it to finish a lot more epochs….So how would you fix that? Like any other program…you have callbacks! Let’s see them in action…

Keras callbacks

A callback is an object that is passed to the model in the call to and that is called by the fit model at various points during training. It has access to all the data available about the state of the model and its performance, and it can take action: interrupt training, save a model, load a different weight set, or otherwise alter the state of the model.

Keras includes a number of built-in callbacks. For this exercise, we will build our own callback which stops the model from training once a desired accuracy is attained.

Custom Callbacks

You can create a custom callback by creating a new R6 class that inherits from the KerasCallback class.

That done, we just have to create an instance of the callback and attach the callback to model training as below:

Reference Material