Image Recognition in R using MXNet

Image recognition, in the context of machine vision, is the ability of software to identify objects, places, people, animals, writing and actions in images. Computers can use machine vision technologies in combination with a camera and artificial intelligence software to achieve image recognition.

In this project we will be using MXNet algorithm to distinguish dogs from cats. MXNet is a deep learning framework designed for both efficiency and flexibility.

We will use the dataset from Kaggle competition Dogs vs. Cats.

1 Install the package

To execute this project you will need to have Rstudio installed. You will also have to install the MXNet package. You can refer to this medium article for more details.

2 Data Preprocessing

Download the training dataset from https://www.kaggle.com/c/dogs-vs-cats/data.

## Directory for images
image_dir <- "/Users/Cartwheel/Downloads/train"
## Set width and height for resizing images

First set the directory for the images and load image names.

source("https://bioconductor.org/biocLite.R")
## Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
biocLite("EBImage")
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
## Installing package(s) 'EBImage'
## 
## The downloaded binary packages are in
##  /var/folders/91/82q46qr51bl4_k18yrfgsq5r0000gn/T//RtmpfzCI7X/downloaded_packages

EBImage provides general purpose functionality for image processing and analysis. To install this package:

library(EBImage)
example_cat_image <- readImage(file.path('/Users/Cartwheel/Downloads/train', "cat.1.jpg"))
display(example_cat_image)

Read and display some examples from the training set:

example_dog_image <- readImage(file.path('/Users/Cartwheel/Downloads/train', "dog.3.jpg"))
display(example_dog_image)

As a quick example, I will use EBImage to resize the images to 28×28 and turn them into greyscale so that I can load them into R easily. To do so, I will use a function to process the images for cats and dogs separately. Each image will be turned into a vector of length 784, with each element representing the value in a pixel.

width <- 28
height <- 28
## pbapply is a library to add progress bar *apply functions
## pblapply will replace lapply
library(pbapply)
extract_feature <- function(dir_path, width, height, is_cat = TRUE, add_label = TRUE) {
  img_size <- width*height
  ## List images in path
  images_names <- list.files(dir_path)
  if (add_label) {
    ## Select only cats or dogs images
    images_names <- images_names[grepl(ifelse(is_cat, "cat", "dog"), images_names)]
    ## Set label, cat = 0, dog = 1
    label <- ifelse(is_cat, 0, 1)
  }
  print(paste("Start processing", length(images_names), "images"))
  ## This function will resize an image, turn it into greyscale
  feature_list <- pblapply(images_names, function(imgname) {
    ## Read image
    img <- readImage(file.path(dir_path, imgname))
    ## Resize image
    img_resized <- resize(img, w = width, h = height)
    ## Set to grayscale
    grayimg <- channel(img_resized, "gray")
    ## Get the image as a matrix
    img_matrix <- grayimg@.Data
    ## Coerce to a vector
    img_vector <- as.vector(t(img_matrix))
    return(img_vector)
  })
  ## bind the list of vector into matrix
  feature_matrix <- do.call(rbind, feature_list)
  feature_matrix <- as.data.frame(feature_matrix)
  ## Set names
  names(feature_matrix) <- paste0("pixel", c(1:img_size))
  if (add_label) {
    ## Add label
    feature_matrix <- cbind(label = label, feature_matrix)
  }
  return(feature_matrix)
}

Process cat and dog images separately and save them into data.frame

cats_data <- extract_feature(dir_path = image_dir, width = width, height = height)
## [1] "Start processing 12500 images"
dogs_data <- extract_feature(dir_path = image_dir, width = width, height = height, is_cat = FALSE)
## [1] "Start processing 12500 images"
dim(cats_data)
## [1] 12500   785
dim(dogs_data)
## [1] 12500   785

In all there are 12,500 images of dogs and 12,500 images of cats

Save the data just in case:

saveRDS(cats_data, "cat.rds")
saveRDS(dogs_data, "dog.rds")
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
## Bind rows in a single dataset
complete_set <- rbind(cats_data, dogs_data)
## test/training partitions
training_index <- createDataPartition(complete_set$label, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- complete_set[training_index,]
dim(train_set)
## [1] 22500   785

Here we split the data into 2. 90% for training and 10% for validation.

test_set <- complete_set[-training_index,]
dim(test_set)
## [1] 2500  785

Reshape the data into a proper format required by the model:

## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -1])
train_y <- train_data[,1]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_x))

test_data <- data.matrix(test_set)
test_x <- t(test_set[,-1])
test_y <- test_set[,1]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))

Training the model:

library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2)
## Output
NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)

## Set seed for reproducibility
mx.set.seed(100)

## Device used. Sadly not the GPU :-(
device <- mx.cpu()

## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
                                     ctx = device,
                                     num.round = 30,
                                     array.batch.size = 100,
                                     learning.rate = 0.05,
                                     momentum = 0.9,
                                     wd = 0.00001,
                                     eval.metric = mx.metric.accuracy,
                                     epoch.end.callback = mx.callback.log.train.metric(100))
## Start training with 1 devices
## [1] Train-accuracy=0.495288885566923
## [2] Train-accuracy=0.495199996630351
## [3] Train-accuracy=0.495199996630351
## [4] Train-accuracy=0.495199996630351
## [5] Train-accuracy=0.495199996630351
## [6] Train-accuracy=0.4954222187731
## [7] Train-accuracy=0.499422218402227
## [8] Train-accuracy=0.53226666437255
## [9] Train-accuracy=0.584622220065859
## [10] Train-accuracy=0.625777777698305
## [11] Train-accuracy=0.656311112774743
## [12] Train-accuracy=0.678622224595812
## [13] Train-accuracy=0.701022223366631
## [14] Train-accuracy=0.718666667938232
## [15] Train-accuracy=0.73199999915229
## [16] Train-accuracy=0.744755555523766
## [17] Train-accuracy=0.761466666062673
## [18] Train-accuracy=0.773600000540415
## [19] Train-accuracy=0.788177773952484
## [20] Train-accuracy=0.797288886970944
## [21] Train-accuracy=0.807155555089315
## [22] Train-accuracy=0.812577776379055
## [23] Train-accuracy=0.816355553468068
## [24] Train-accuracy=0.825288886759016
## [25] Train-accuracy=0.833022219869826
## [26] Train-accuracy=0.842044444349077
## [27] Train-accuracy=0.848088889651828
## [28] Train-accuracy=0.85315555466546
## [29] Train-accuracy=0.849244442515903
## [30] Train-accuracy=0.859333332114749

After 30 iterations, this model achieves a peak performance of about 85% accuracy. Next let’s see how it performs on the test set.

## Test test set
predict_probs <- predict(model, test_array)
predicted_labels <- max.col(t(predict_probs)) - 1
table(test_data[, 1], predicted_labels)
##    predicted_labels
##       0   1
##   0 788 462
##   1 332 918
sum(diag(table(test_data[, 1], predicted_labels)))/2500
## [1] 0.6824

The model reaches 69% accuracy on the test set. The score is of course a bit mediocre, but it can be easily improved by tuning the model, using more pixels and RGB representation. Another way of improving accuracy is by increasing the amount of images used for training.