library(keras)
The classification of images are best managed by convolutional neural networks (CNN). Before embarking on the use of novel images, it is best to look at the built-in images provided as datasets in Keras.
The MNIST dataset contains small $ 28 28 $ pixel grey scale images of handwritten digits that have been classified by humans.
This dataset can be directly imported for use.
The dataset_mnist() is built into Keras and can be assigned to a computer variable.
mnist <- dataset_mnist()
The dataset has already been divided into a training and test set, each with a set of feature variables (the images) and a set or target values.
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
The dimensions of the training feature set (the images) is given below.
dim(x_train)
## [1] 60000 28 28
Note that there are \(60000\) grey scale images each of pixel size $ 28 28 $. These values are assigned to computer variable below.
img_rows <- 28
img_cols <- 28
These images are not in the the correct shape as tensors, as the number of channels is missing. This can be corrected for both the training and test sets by using the array_reshape() function. The code below also creates the input_shape variable to hold the correct dimensions of the images.
x_train <- array_reshape(x_train,
c(nrow(x_train),
img_rows,
img_cols, 1))
x_test <- array_reshape(x_test,
c(nrow(x_test),
img_rows,
img_cols, 1))
input_shape <- c(img_rows,
img_cols, 1)
dim(x_train)
## [1] 60000 28 28 1
As with all neural networks thus far, the data must be normalized. Since the pixel values represent brigness on a scale from \(0\) (black) to \(255\) (white), they can all be rescaled by dividing each by the maximum value of \(255\).
x_train <- x_train / 255
x_test <- x_test / 255
The sample space of the target variable contains \(10\) elements, i.e. there are \(10\) classes in the target variable. these can be one-hot-encoded using the to_categorical() function.
num_classes = 10
y_train <- to_categorical(y_train, num_classes)
y_test <- to_categorical(y_test, num_classes)
The first image in the training set is a \(5\) (the count starts at zero).
y_train[1,]
## [1] 0 0 0 0 0 1 0 0 0 0
Below is a simple CNN. It contains one convolutional layers. The first has \(16\) filters, each of size $ 3 3$ and uses the rectified linear unit activtion function.
This is followed by a max pooling layer with a grid size of $ 2 2 $. Next up is a dropout layer, set to \(0.25\).
The last resultant image is flattened before passing through a single densely connected layer with \(10\) nodes using the rectified linear unit activation function. A \(0.5\) dropout is used to combat overfitting. The output layer has \(10\) nodes (as there are \(10\) classes) and uses the softmax activation function.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 16,
kernel_size = c(3,3),
activation = 'relu',
input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 10,
activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes,
activation = 'softmax')
A summary of the model shows $ 27320 $ learnable parameters.
model %>% summary()
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d_1 (Conv2D) (None, 26, 26, 16) 160
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D) (None, 13, 13, 16) 0
## ___________________________________________________________________________
## dropout_1 (Dropout) (None, 13, 13, 16) 0
## ___________________________________________________________________________
## flatten_1 (Flatten) (None, 2704) 0
## ___________________________________________________________________________
## dense_1 (Dense) (None, 10) 27050
## ___________________________________________________________________________
## dropout_2 (Dropout) (None, 10) 0
## ___________________________________________________________________________
## dense_2 (Dense) (None, 10) 110
## ===========================================================================
## Total params: 27,320
## Trainable params: 27,320
## Non-trainable params: 0
## ___________________________________________________________________________
Categorical cross-entropy serves as the loss function. Adadelta optimizes the gradient descent and accuracy serves as metric.
model %>% compile(
loss = loss_categorical_crossentropy,
optimizer = optimizer_adadelta(),
metrics = c('accuracy')
)
A mini-batch size of \(128\) will allow the tensors to fit into the memory of the NVidia graphics processing unit of the current machine. The model will run over \(12\) epochs, with a validation split set at \(0.2\).
batch_size <- 128
epochs <- 12
# Train model
model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
validation_split = 0.2
)
The model can be evaluated using the test data.
score <- model %>% evaluate(x_test,
y_test)
cat('Test loss: ', score$loss, "\n")
## Test loss: 0.1820279
cat('Test accuracy: ', score$acc, "\n")
## Test accuracy: 0.9645