MXNet is a deep learning framework designed for both efficiency and flexibility. This is an example of using MXNet in R for image classification. In the example, I will use the dataset from Kaggle competition Dogs vs. Cats.

1 Install the package

install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("mxnet")

2 Data Preprocessing

Download the training dataset from https://www.kaggle.com/c/dogs-vs-cats/data.

First set the directory for the images and load image names.

## Directory for images
image_dir <- "...\\train"
## Set width and height for resizing images

EBImage provides general purpose functionality for image processing and analysis. To install this package:

source("https://bioconductor.org/biocLite.R")
biocLite("EBImage")

Read and display some examples from the training set:

library(EBImage)
example_cat_image <- readImage(file.path(image_dir, "cat.0.jpg"))
display(example_cat_image)

example_dog_image <- readImage(file.path(image_dir, "dog.0.jpg"))
display(example_dog_image)

As a quick example, I will use EBImage to resize the images to \(28\times28\) and turn them into greyscale so that I can load them into R easily. To do so, I will use a function to process the images for cats and dogs separately. Each image will be turned into a vector of length 784, with each element representing the value in a pixel.

width <- 28
height <- 28
## pbapply is a library to add progress bar *apply functions
## pblapply will replace lapply
library(pbapply)
extract_feature <- function(dir_path, width, height, is_cat = TRUE, add_label = TRUE) {
  img_size <- width*height
  ## List images in path
  images_names <- list.files(dir_path)
  if (add_label) {
    ## Select only cats or dogs images
    images_names <- images_names[grepl(ifelse(is_cat, "cat", "dog"), images_names)]
    ## Set label, cat = 0, dog = 1
    label <- ifelse(is_cat, 0, 1)
  }
  print(paste("Start processing", length(images_names), "images"))
  ## This function will resize an image, turn it into greyscale
  feature_list <- pblapply(images_names, function(imgname) {
    ## Read image
    img <- readImage(file.path(dir_path, imgname))
    ## Resize image
    img_resized <- resize(img, w = width, h = height)
    ## Set to grayscale
    grayimg <- channel(img_resized, "gray")
    ## Get the image as a matrix
    img_matrix <- grayimg@.Data
    ## Coerce to a vector
    img_vector <- as.vector(t(img_matrix))
    return(img_vector)
  })
  ## bind the list of vector into matrix
  feature_matrix <- do.call(rbind, feature_list)
  feature_matrix <- as.data.frame(feature_matrix)
  ## Set names
  names(feature_matrix) <- paste0("pixel", c(1:img_size))
  if (add_label) {
    ## Add label
    feature_matrix <- cbind(label = label, feature_matrix)
  }
  return(feature_matrix)
}

Process cat and dog images separately and save them into data.frame

cats_data <- extract_feature(dir_path = image_dir, width = width, height = height)
dogs_data <- extract_feature(dir_path = image_dir, width = width, height = height, is_cat = FALSE)
dim(cats_data)

## [1] 12500   785

dim(dogs_data)

## [1] 12500   785

Save the data just in case:

saveRDS(cats_data, "cat.rds")
saveRDS(dogs_data, "dog.rds")

3 Model Training

Data partitions: randomly split 90% of data into training set with equal weights for cats and dogs, and the rest 10% will be used as the test set.

library(caret)
## Bind rows in a single dataset
complete_set <- rbind(cats_data, dogs_data)
## test/training partitions
training_index <- createDataPartition(complete_set$label, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- complete_set[training_index,]
dim(train_set)

## [1] 22500   785

test_set <- complete_set[-training_index,]
dim(test_set)

## [1] 2500  785

Reshape the data into a proper format required by the model:

## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -1])
train_y <- train_data[,1]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_x))

test_data <- data.matrix(test_set)
test_x <- t(test_set[,-1])
test_y <- test_set[,1]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))

Training the model:

library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2)
## Output
NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)

## Set seed for reproducibility
mx.set.seed(100)

## Device used. Sadly not the GPU :-(
device <- mx.cpu()

## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
                                     ctx = device,
                                     num.round = 30,
                                     array.batch.size = 100,
                                     learning.rate = 0.05,
                                     momentum = 0.9,
                                     wd = 0.00001,
                                     eval.metric = mx.metric.accuracy,
                                     epoch.end.callback = mx.callback.log.train.metric(100))

## Start training with 1 devices
## [1] Train-accuracy=0.493973214285715
## [2] Train-accuracy=0.492755555555556
## [3] Train-accuracy=0.492755555555556
## [4] Train-accuracy=0.4932
## [5] Train-accuracy=0.495688888888889
## [6] Train-accuracy=0.5264
## [7] Train-accuracy=0.583644444444444
## [8] Train-accuracy=0.6212
## [9] Train-accuracy=0.648577777777778
## [10] Train-accuracy=0.671822222222222
## [11] Train-accuracy=0.692444444444445
## [12] Train-accuracy=0.710533333333333
## [13] Train-accuracy=0.724622222222222
## [14] Train-accuracy=0.735955555555555
## [15] Train-accuracy=0.747288888888889
## [16] Train-accuracy=0.759111111111111
## [17] Train-accuracy=0.772355555555556
## [18] Train-accuracy=0.7848
## [19] Train-accuracy=0.792133333333334
## [20] Train-accuracy=0.803822222222223
## [21] Train-accuracy=0.814977777777778
## [22] Train-accuracy=0.821822222222223
## [23] Train-accuracy=0.825155555555556
## [24] Train-accuracy=0.824933333333334
## [25] Train-accuracy=0.834044444444444
## [26] Train-accuracy=0.827066666666667
## [27] Train-accuracy=0.828177777777778
## [28] Train-accuracy=0.837644444444445
## [29] Train-accuracy=0.848133333333333
## [30] Train-accuracy=0.849422222222222

After 30 iterations, this model achieves a peak performance of about 85% accuracy. Next let’s see how it performs on the test set.

## Test test set
predict_probs <- predict(model, test_array)
predicted_labels <- max.col(t(predict_probs)) - 1
table(test_data[, 1], predicted_labels)

##    predicted_labels
##        0    1
##   0 1099  151
##   1  470  780

sum(diag(table(test_data[, 1], predicted_labels)))/2500

## [1] 0.7516

The model reaches 75% accuracy on the test set. The score is of course a bit mediocre, but it can be easily improved by tuning the model, using more pixels and RGB representation.

Image Classification in R: MXNet

Shikun Li

1 Install the package

2 Data Preprocessing

3 Model Training