Image recognition, in the context of machine vision, is the ability of software to identify objects, places, people, animals, writing and actions in images. Computers can use machine vision technologies in combination with a camera and artificial intelligence software to achieve image recognition.
In this project we will be using MXNet algorithm to distinguish dogs from cats. MXNet is a deep learning framework designed for both efficiency and flexibility.
We will use the dataset from Kaggle competition Dogs vs. Cats.
To execute this project you will need to have Rstudio installed. You will also have to install the MXNet package. You can refer to this medium article for more details.
Download the training dataset from https://www.kaggle.com/c/dogs-vs-cats/data.
## Directory for images
image_dir <- "/Users/Cartwheel/Downloads/train"
## Set width and height for resizing images
First set the directory for the images and load image names.
source("https://bioconductor.org/biocLite.R")
## Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
biocLite("EBImage")
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
## Installing package(s) 'EBImage'
##
## The downloaded binary packages are in
## /var/folders/91/82q46qr51bl4_k18yrfgsq5r0000gn/T//RtmpfzCI7X/downloaded_packages
EBImage provides general purpose functionality for image processing and analysis. To install this package:
library(EBImage)
example_cat_image <- readImage(file.path('/Users/Cartwheel/Downloads/train', "cat.1.jpg"))
display(example_cat_image)
Read and display some examples from the training set:
example_dog_image <- readImage(file.path('/Users/Cartwheel/Downloads/train', "dog.3.jpg"))
display(example_dog_image)
As a quick example, I will use EBImage to resize the images to 28×28 and turn them into greyscale so that I can load them into R easily. To do so, I will use a function to process the images for cats and dogs separately. Each image will be turned into a vector of length 784, with each element representing the value in a pixel.
width <- 28
height <- 28
## pbapply is a library to add progress bar *apply functions
## pblapply will replace lapply
library(pbapply)
extract_feature <- function(dir_path, width, height, is_cat = TRUE, add_label = TRUE) {
img_size <- width*height
## List images in path
images_names <- list.files(dir_path)
if (add_label) {
## Select only cats or dogs images
images_names <- images_names[grepl(ifelse(is_cat, "cat", "dog"), images_names)]
## Set label, cat = 0, dog = 1
label <- ifelse(is_cat, 0, 1)
}
print(paste("Start processing", length(images_names), "images"))
## This function will resize an image, turn it into greyscale
feature_list <- pblapply(images_names, function(imgname) {
## Read image
img <- readImage(file.path(dir_path, imgname))
## Resize image
img_resized <- resize(img, w = width, h = height)
## Set to grayscale
grayimg <- channel(img_resized, "gray")
## Get the image as a matrix
img_matrix <- grayimg@.Data
## Coerce to a vector
img_vector <- as.vector(t(img_matrix))
return(img_vector)
})
## bind the list of vector into matrix
feature_matrix <- do.call(rbind, feature_list)
feature_matrix <- as.data.frame(feature_matrix)
## Set names
names(feature_matrix) <- paste0("pixel", c(1:img_size))
if (add_label) {
## Add label
feature_matrix <- cbind(label = label, feature_matrix)
}
return(feature_matrix)
}
Process cat and dog images separately and save them into data.frame
cats_data <- extract_feature(dir_path = image_dir, width = width, height = height)
## [1] "Start processing 12500 images"
dogs_data <- extract_feature(dir_path = image_dir, width = width, height = height, is_cat = FALSE)
## [1] "Start processing 12500 images"
dim(cats_data)
## [1] 12500 785
dim(dogs_data)
## [1] 12500 785
In all there are 12,500 images of dogs and 12,500 images of cats
Save the data just in case:
saveRDS(cats_data, "cat.rds")
saveRDS(dogs_data, "dog.rds")
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
## Bind rows in a single dataset
complete_set <- rbind(cats_data, dogs_data)
## test/training partitions
training_index <- createDataPartition(complete_set$label, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- complete_set[training_index,]
dim(train_set)
## [1] 22500 785
Here we split the data into 2. 90% for training and 10% for validation.
test_set <- complete_set[-training_index,]
dim(test_set)
## [1] 2500 785
Reshape the data into a proper format required by the model:
## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -1])
train_y <- train_data[,1]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_x))
test_data <- data.matrix(test_set)
test_x <- t(test_set[,-1])
test_y <- test_set[,1]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
Training the model:
library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2)
## Output
NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
## Set seed for reproducibility
mx.set.seed(100)
## Device used. Sadly not the GPU :-(
device <- mx.cpu()
## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
ctx = device,
num.round = 30,
array.batch.size = 100,
learning.rate = 0.05,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.accuracy,
epoch.end.callback = mx.callback.log.train.metric(100))
## Start training with 1 devices
## [1] Train-accuracy=0.495288885566923
## [2] Train-accuracy=0.495199996630351
## [3] Train-accuracy=0.495199996630351
## [4] Train-accuracy=0.495199996630351
## [5] Train-accuracy=0.495199996630351
## [6] Train-accuracy=0.4954222187731
## [7] Train-accuracy=0.499422218402227
## [8] Train-accuracy=0.53226666437255
## [9] Train-accuracy=0.584622220065859
## [10] Train-accuracy=0.625777777698305
## [11] Train-accuracy=0.656311112774743
## [12] Train-accuracy=0.678622224595812
## [13] Train-accuracy=0.701022223366631
## [14] Train-accuracy=0.718666667938232
## [15] Train-accuracy=0.73199999915229
## [16] Train-accuracy=0.744755555523766
## [17] Train-accuracy=0.761466666062673
## [18] Train-accuracy=0.773600000540415
## [19] Train-accuracy=0.788177773952484
## [20] Train-accuracy=0.797288886970944
## [21] Train-accuracy=0.807155555089315
## [22] Train-accuracy=0.812577776379055
## [23] Train-accuracy=0.816355553468068
## [24] Train-accuracy=0.825288886759016
## [25] Train-accuracy=0.833022219869826
## [26] Train-accuracy=0.842044444349077
## [27] Train-accuracy=0.848088889651828
## [28] Train-accuracy=0.85315555466546
## [29] Train-accuracy=0.849244442515903
## [30] Train-accuracy=0.859333332114749
After 30 iterations, this model achieves a peak performance of about 85% accuracy. Next let’s see how it performs on the test set.
## Test test set
predict_probs <- predict(model, test_array)
predicted_labels <- max.col(t(predict_probs)) - 1
table(test_data[, 1], predicted_labels)
## predicted_labels
## 0 1
## 0 788 462
## 1 332 918
sum(diag(table(test_data[, 1], predicted_labels)))/2500
## [1] 0.6824
The model reaches 69% accuracy on the test set. The score is of course a bit mediocre, but it can be easily improved by tuning the model, using more pixels and RGB representation. Another way of improving accuracy is by increasing the amount of images used for training.