Example of a convolutional neural network

Introduction
The MNIST dataset
Importing the data
Preparing the data
The model
- Creating the model
- Compiling
Training
Evaluating the accuracy

library(keras)

Introduction

The classification of images are best managed by convolutional neural networks (CNN). Before embarking on the use of novel images, it is best to look at the built-in images provided as datasets in Keras.

The MNIST dataset

The MNIST dataset contains small $ 28 28 $ pixel grey scale images of handwritten digits that have been classified by humans.

This dataset can be directly imported for use.

Importing the data

The dataset_mnist() is built into Keras and can be assigned to a computer variable.

mnist <- dataset_mnist()

Preparing the data

The dataset has already been divided into a training and test set, each with a set of feature variables (the images) and a set or target values.

x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

The dimensions of the training feature set (the images) is given below.

dim(x_train)

## [1] 60000    28    28

Note that there are $60000$ grey scale images each of pixel size $ 28 28 $. These values are assigned to computer variable below.

img_rows <- 28
img_cols <- 28

These images are not in the the correct shape as tensors, as the number of channels is missing. This can be corrected for both the training and test sets by using the array_reshape() function. The code below also creates the input_shape variable to hold the correct dimensions of the images.

x_train <- array_reshape(x_train,
                         c(nrow(x_train),
                           img_rows,
                           img_cols, 1))
x_test <- array_reshape(x_test,
                        c(nrow(x_test),
                          img_rows,
                          img_cols, 1))
input_shape <- c(img_rows,
                 img_cols, 1)

dim(x_train)

## [1] 60000    28    28     1

As with all neural networks thus far, the data must be normalized. Since the pixel values represent brigness on a scale from $0$ (black) to $255$ (white), they can all be rescaled by dividing each by the maximum value of $255$.

x_train <- x_train / 255
x_test <- x_test / 255

The sample space of the target variable contains $10$ elements, i.e. there are $10$ classes in the target variable. these can be one-hot-encoded using the to_categorical() function.

num_classes = 10
y_train <- to_categorical(y_train, num_classes)
y_test <- to_categorical(y_test, num_classes)

The first image in the training set is a $5$ (the count starts at zero).

y_train[1,]

##  [1] 0 0 0 0 0 1 0 0 0 0

The model

Below is a simple CNN. It contains one convolutional layers. The first has $16$ filters, each of size $ 3 3$ and uses the rectified linear unit activtion function.

This is followed by a max pooling layer with a grid size of $ 2 2 $. Next up is a dropout layer, set to $0.25$.

The last resultant image is flattened before passing through a single densely connected layer with $10$ nodes using the rectified linear unit activation function. A $0.5$ dropout is used to combat overfitting. The output layer has $10$ nodes (as there are $10$ classes) and uses the softmax activation function.

Creating the model

model <- keras_model_sequential() %>%
  layer_conv_2d(filters = 16,
                kernel_size = c(3,3),
                activation = 'relu',
                input_shape = input_shape) %>% 
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_dropout(rate = 0.25) %>% 
  layer_flatten() %>% 
  layer_dense(units = 10,
              activation = 'relu') %>% 
  layer_dropout(rate = 0.5) %>% 
  layer_dense(units = num_classes,
              activation = 'softmax')

A summary of the model shows $ 27320 $ learnable parameters.

model %>% summary()

## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_1 (Conv2D)                (None, 26, 26, 16)            160         
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)   (None, 13, 13, 16)            0           
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 13, 13, 16)            0           
## ___________________________________________________________________________
## flatten_1 (Flatten)              (None, 2704)                  0           
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 10)                    27050       
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 10)                    0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 10)                    110         
## ===========================================================================
## Total params: 27,320
## Trainable params: 27,320
## Non-trainable params: 0
## ___________________________________________________________________________

Compiling

Categorical cross-entropy serves as the loss function. Adadelta optimizes the gradient descent and accuracy serves as metric.

model %>% compile(
  loss = loss_categorical_crossentropy,
  optimizer = optimizer_adadelta(),
  metrics = c('accuracy')
)

Training

A mini-batch size of $128$ will allow the tensors to fit into the memory of the NVidia graphics processing unit of the current machine. The model will run over $12$ epochs, with a validation split set at $0.2$.

batch_size <- 128
epochs <- 12

# Train model
model %>% fit(
  x_train, y_train,
  batch_size = batch_size,
  epochs = epochs,
  validation_split = 0.2
)

Evaluating the accuracy

The model can be evaluated using the test data.

score <- model %>% evaluate(x_test,
                            y_test)

cat('Test loss: ', score$loss, "\n")

## Test loss:  0.1820279

cat('Test accuracy: ', score$acc, "\n")

## Test accuracy:  0.9645