Deep Learning has proved to be a very powerful tool because of its ability to handle large amounts of data. The interest to use hidden layers has surpassed traditional techniques, especially in pattern recognition. One of the most popular deep neural networks is Convolutional Neural Networks. A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. On this article, we will try to build a Convolutional Neural Network to classify images whether it is beach, forest, or mountain.
knitr::include_graphics("new-zealand.jpeg")# Data wrangling
library(tidyverse)## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Image manipulation
library(imager)## Warning: package 'imager' was built under R version 4.0.5
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
##
## add
## The following object is masked from 'package:stringr':
##
## boundary
## The following object is masked from 'package:tidyr':
##
## fill
## The following objects are masked from 'package:stats':
##
## convolve, spectrum
## The following object is masked from 'package:graphics':
##
## frame
## The following object is masked from 'package:base':
##
## save.image
# Deep learning
library(tensorflow)## Warning: package 'tensorflow' was built under R version 4.0.5
tf_version()## [1] '2.4'
library(keras)
# Use conda environment
use_condaenv("r-tensorflow")
# Model Evaluation
library(caret)## Warning: package 'caret' was built under R version 4.0.5
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:tensorflow':
##
## train
## The following object is masked from 'package:purrr':
##
## lift
options(scipen = 999)In image classification problem, it is a common practice to put each image on separate folders based on the target class/labels. For example, inside the train folder in our data, you can that we have 3 different folders, respectively for beach, forest, and mountain.
folder_list <- list.files("data/train/")
folder_path <- paste0("data/train/", folder_list, "/")
file_name <- map(folder_path,
function(x) paste0(x, list.files(x))
) %>%
unlist()
head(file_name)## [1] "data/train/beach/beach_100.jpeg" "data/train/beach/beach_101.jpeg"
## [3] "data/train/beach/beach_102.jpeg" "data/train/beach/beach_103.jpeg"
## [5] "data/train/beach/beach_104.jpeg" "data/train/beach/beach_105.jpeg"
First randomly check on 6 images from the data
set.seed(28)
sample_image <- sample(file_name, 6)
# Load image into R
img <- map(sample_image, load.image)
# Plot image
par(mfrow = c(2, 3)) # Create 2 x 3 image grid
map(img, plot)## [[1]]
## Image. Width: 364 pix Height: 138 pix Depth: 1 Colour channels: 3
##
## [[2]]
## Image. Width: 275 pix Height: 183 pix Depth: 1 Colour channels: 3
##
## [[3]]
## Image. Width: 275 pix Height: 183 pix Depth: 1 Colour channels: 3
##
## [[4]]
## Image. Width: 275 pix Height: 183 pix Depth: 1 Colour channels: 3
##
## [[5]]
## Image. Width: 367 pix Height: 137 pix Depth: 1 Colour channels: 3
##
## [[6]]
## Image. Width: 325 pix Height: 155 pix Depth: 1 Colour channels: 3
To build a deep learning model, we need to understand the dimension of the input images. So that If the dimension of the input images are varying, we can create a proper target size for the model. Now let’s have a function to get all the informations about the image dimension.
get_dim <- function(x){
img <- load.image(x)
df_img <- data.frame(height = height(img),
width = width(img),
filename = x
)
return(df_img)
}# Run the get_dim() function for each image
file_dim <- map_df(file_name, get_dim)
head(file_dim, 10)## height width filename
## 1 181 279 data/train/beach/beach_100.jpeg
## 2 135 372 data/train/beach/beach_101.jpeg
## 3 170 296 data/train/beach/beach_102.jpeg
## 4 183 276 data/train/beach/beach_103.jpeg
## 5 190 266 data/train/beach/beach_104.jpeg
## 6 168 300 data/train/beach/beach_105.jpeg
## 7 159 318 data/train/beach/beach_106.jpeg
## 8 184 274 data/train/beach/beach_107.jpeg
## 9 225 225 data/train/beach/beach_108.jpeg
## 10 126 401 data/train/beach/beach_109.jpeg
summary(file_dim)## height width filename
## Min. : 94.0 Min. :100.0 Length:1453
## 1st Qu.:168.0 1st Qu.:268.0 Class :character
## Median :183.0 Median :275.0 Mode :character
## Mean :177.8 Mean :282.4
## 3rd Qu.:184.0 3rd Qu.:300.0
## Max. :314.0 Max. :534.0
The image data has a great variation in the dimension. Some images has less than 100 pixels in height and width while others has up to 300 pixels, even 500 pixels. Understanding the dimension of the image will help us on the next part of the process which is data pre-processing.
Based on the previous summary of the image dimensions, we can determine the input dimension for the deep learning model. All input images should have the same dimensions. Bigger dimensions will have more features but will also take longer time to train. However, if the image size is too small, we will lose a lot of information from the data. So balancing this trade-off is the art of data preprocessing in image classification. For now, we will try 128 x 128 pixels. We also set the batch size for the data so the model will be updated every time it finished training on a single batch. Here, we set the batch size to 32.
# Desired height and width of images
target_size <- c(128, 128)
# Batch size for training the model
batch_size <- 32Since we have a little amount of training set, we will build artificial data using method called Image Augmentation. Image augmentation is one useful technique from keras that can manipulate the image training set without acquiring new images. The goal is that to teach the model not only with the original image but also the modification of the image, such as flipping the image, rotate it, zooming, crop the image, etc. This will create more robust model. To do image augmentation, we can fit the data into a generator. Here, we will create the image generator for keras with the following properties:
train_data_gen <- image_data_generator(rescale = 1/255, # Rescaling pixel value
horizontal_flip = T, # Flip image horizontally
rotation_range = 20, # Rotate image from 0 to 180 degrees
validation_split = 0.2, # 20% data as validation data
shear_range = 0.2, #Shear the intensity by 0.2
brightness_range = c(0.7, 1.3),
width_shift_range=0.2,
height_shift_range=0.2,
fill_mode = "nearest"
)Now it’s time to insert the image data into the generator using the flow_images_from_directory(). The data is located inside the data folder and inside the train folder, so the directory will be data/train. From this process, we will get the augmented image both for training data and the validation data.
# Training Dataset
train_image_array_gen <- flow_images_from_directory(directory = "data/train/", # Folder of the data
target_size = target_size, # target of the image dimension (64 x 64)
color_mode = "rgb", # use RGB color
batch_size = batch_size ,
seed = 28, # set random seed
subset = "training", # declare that this is for training data
class_mode = "categorical",
generator = train_data_gen
)
# Validation Dataset
val_image_array_gen <- flow_images_from_directory(directory = "data/train/",
target_size = target_size,
color_mode = "rgb",
batch_size = batch_size ,
seed = 28,
subset = "validation", # declare that this is the validation data
class_mode = "categorical",
generator = train_data_gen
)And then collect some information from the generator and check the class proportion of the train dataset. The index correspond to each labels of the target variable and ordered alphabetically (beach, forest, and mountain).
train_samples <- train_image_array_gen$n
# Number of validation samples
valid_samples <- val_image_array_gen$n
# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)
table("\nFrequency" = factor(train_image_array_gen$classes)
) %>%
prop.table()##
## Frequency
## 0 1 2
## 0.3324742 0.3256014 0.3419244
For now it is well balanced. But I had to say that at first the original proportion of the train dataset is around 0.32, 0.30, and 0.37 and I have to balance the data by copying themselves manually (oversampling). Unbalanced classes create a problem like bias, inaccurate and unsatisfactory classifiers.
After Data Pre-processing, we will start building the model architecture for the deep learning. We will build a first model with the following layer:
Don’t forget to set the input size in the first layer. If the input image is in RGB, set the final number to 3, which is the number of color channels. If the input image is in grayscale, set the final number to 1.
input_shape <- c(target_size, 3)tf$random$set_seed(28)
model <- keras_model_sequential(name = "where_am_I") %>%
# Convolution Layer
layer_conv_2d(filters = 16,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filters = 32,
kernel_size = c(3,3),
padding = "same",
activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filters = 64,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filters = 128,
kernel_size = c(3,3),
padding = "same",
activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# Max Pooling Layer
# Flattening Layer
layer_flatten() %>%
# Dense Layer
layer_dense(units = 128,
activation = "relu") %>%
layer_dense(units = 64,
activation = "relu") %>%
layer_dropout(rate = 0.5) %>%
# Output Layer
layer_dense(units = output_n,
activation = "softmax",
name = "Output")
model## Model
## Model: "where_am_I"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_3 (Conv2D) (None, 128, 128, 16) 448
## ________________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D) (None, 64, 64, 16) 0
## ________________________________________________________________________________
## conv2d_2 (Conv2D) (None, 64, 64, 32) 4640
## ________________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D) (None, 32, 32, 32) 0
## ________________________________________________________________________________
## conv2d_1 (Conv2D) (None, 32, 32, 64) 18496
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D) (None, 16, 16, 64) 0
## ________________________________________________________________________________
## conv2d (Conv2D) (None, 16, 16, 128) 73856
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D) (None, 8, 8, 128) 0
## ________________________________________________________________________________
## flatten (Flatten) (None, 8192) 0
## ________________________________________________________________________________
## dense_1 (Dense) (None, 128) 1048704
## ________________________________________________________________________________
## dense (Dense) (None, 64) 8256
## ________________________________________________________________________________
## dropout (Dropout) (None, 64) 0
## ________________________________________________________________________________
## Output (Dense) (None, 3) 195
## ================================================================================
## Total params: 1,154,595
## Trainable params: 1,154,595
## Non-trainable params: 0
## ________________________________________________________________________________
Start by entering image data with 128 x 128 pixels into the convolutional layer, which has 16 filters to extract featuers from the image. The padding = same argument is used to keep the dimension of the feature to be 128 x 128 pixels after being extracted. We then downsample or only take the maximum value for each 2 x 2 pooling area so the data now only has 64 x 64 pixels with from 16 filters and so on, till we get 8 x 8 pixels from 128 filters. After that, from 8 x 8 pixels we flatten the 2D array into a 1D array with 8 x 8 x 128 = 8192 nodes. We can further extract information using the simple dense layer and finished by flowing the information into the output layer, which will be transformed using the softmax activation function to get the probability of each class as the output.
Now with the data and the model are ready, It is time to fit the data to the model. For starter, we will use 25 epochs to train the data. For multi-classes classification, we will use categorical cross-entropy as the loss function. We use adam optimizer with the default learning rate (0.001). We will also evaluate the model with the validation data from the generator.
model%>%
compile(
loss = "categorical_crossentropy",
optimizer = optimizer_adam(),
metrics = "accuracy"
)
# Fit data into model
history <- model %>%
fit(
# training data
train_image_array_gen,
# training epochs
steps_per_epoch = as.integer(train_samples/batch_size),
epochs = 25,
# validation data
validation_data = val_image_array_gen,
validation_steps = as.integer(valid_samples/batch_size),
verbose = 2
)
plot(history)## `geom_smooth()` using formula 'y ~ x'
Now we will further evaluate and acquire the confusion matrix using the validation data from the generator. First, we need to acquire the file name of the image that is used as the data validation. From the file name, we will extract the categorical label as the actual value of the target variable.
val_data <- data.frame(file_name = paste0("data/train/", val_image_array_gen$filenames)) %>%
mutate(class = str_extract(file_name, "beach|forest|mountain"))
head(val_data, 10)## file_name class
## 1 data/train/beach\\beach_100.jpeg beach
## 2 data/train/beach\\beach_101.jpeg beach
## 3 data/train/beach\\beach_102.jpeg beach
## 4 data/train/beach\\beach_103.jpeg beach
## 5 data/train/beach\\beach_104.jpeg beach
## 6 data/train/beach\\beach_105.jpeg beach
## 7 data/train/beach\\beach_106.jpeg beach
## 8 data/train/beach\\beach_107.jpeg beach
## 9 data/train/beach\\beach_108.jpeg beach
## 10 data/train/beach\\beach_109.jpeg beach
We need to get the image into R by converting the image into an array. Since our input dimension for CNN model is image with 64 x 64 pixels with 3 color channels (RGB), we will do the same with the image of the testing data. The reason of using array is that we want to predict the original image fresh from the folder so we will not use the image generator since it will transform the image and does not reflect the actual image.
image_prep <- function(x) {
arrays <- lapply(x, function(path) {
img <- image_load(path, target_size = target_size,
grayscale = F # Set FALSE if image is RGB
)
x <- image_to_array(img)
x <- array_reshape(x, c(1, dim(x)))
x <- x/255 # rescale image pixel
})
do.call(abind::abind, c(arrays, list(along = 1)))
}val_x <- image_prep(val_data$file_name)
# Check dimension of testing data set
dim(val_x)## [1] 289 128 128 3
The validation data consists of 264 images with dimensions of 128 x 128 pixels and 3 color channels (RGB). After we have prepared the data test, we now can proceed to predict the label of each image using our CNN model.
pred_val1 <- predict_classes(model, val_x)
head(pred_val1, 10)## [1] 0 0 0 0 0 0 0 0 0 0
For easier presentation, we will convert the encoding into the true class label.
# Convert encoding to label
decode <- function(x){
case_when(x == 0 ~ "beach",
x == 1 ~ "forest",
x == 2 ~ "mountain"
)
}
pred_val1 <- sapply(pred_val1, decode)
head(pred_val1)## [1] "beach" "beach" "beach" "beach" "beach" "beach"
Finally evaluate the model using confusion matrix below. And as we can see the accuracy is pretty high for the first model. The precision, sensitivity, or specificity are also high enough above 80% for each class. Hopefully the model can be tuned by improving the model architecture.
confusionMatrix(as.factor(pred_val1),
as.factor(val_data$class)
)## Confusion Matrix and Statistics
##
## Reference
## Prediction beach forest mountain
## beach 86 0 7
## forest 0 91 6
## mountain 10 3 86
##
## Overall Statistics
##
## Accuracy : 0.91
## 95% CI : (0.871, 0.9404)
## No Information Rate : 0.3426
## P-Value [Acc > NIR] : < 0.00000000000000022
##
## Kappa : 0.865
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: beach Class: forest Class: mountain
## Sensitivity 0.8958 0.9681 0.8687
## Specificity 0.9637 0.9692 0.9316
## Pos Pred Value 0.9247 0.9381 0.8687
## Neg Pred Value 0.9490 0.9844 0.9316
## Prevalence 0.3322 0.3253 0.3426
## Detection Rate 0.2976 0.3149 0.2976
## Detection Prevalence 0.3218 0.3356 0.3426
## Balanced Accuracy 0.9298 0.9687 0.9001
We will try to add more convolutional layer and max pooling layer to extract more features from the images. So here is our new model architecture:
1st Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
2nd Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
3rd Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
4th Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
5th Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
Flattening layer from 2D array to 1D array
Dense layer to capture more information
Dense layer for output layer
Actually we can be more creative when designing our own model architecture. But as we already get an accuracy of 87.8%, we can design casually by adding just some layers.
tf$random$set_seed(28)
model_tune2 <- keras_model_sequential(name = "where_am_I") %>%
# Convolution Layer and Max Pooling Layer
layer_conv_2d(filters = 16,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filters = 32,
kernel_size = c(3,3),
padding = "same",
activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filters = 64,
kernel_size = c(3,3),
padding = "same",
activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filters = 128,
kernel_size = c(3,3),
padding = "same",
activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filters = 256,
kernel_size = c(3,3),
padding = "same",
activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# Flattening Layer
layer_flatten() %>%
# Dense Layer
layer_dense(units = 128,
activation = "relu") %>%
layer_dense(units = 64,
activation = "relu") %>%
layer_dropout(rate = 0.5) %>%
# Output Layer
layer_dense(units = output_n,
activation = "softmax",
name = "Output")
model_tune2## Model
## Model: "where_am_I"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_8 (Conv2D) (None, 128, 128, 16) 448
## ________________________________________________________________________________
## max_pooling2d_8 (MaxPooling2D) (None, 64, 64, 16) 0
## ________________________________________________________________________________
## conv2d_7 (Conv2D) (None, 64, 64, 32) 4640
## ________________________________________________________________________________
## max_pooling2d_7 (MaxPooling2D) (None, 32, 32, 32) 0
## ________________________________________________________________________________
## conv2d_6 (Conv2D) (None, 32, 32, 64) 18496
## ________________________________________________________________________________
## max_pooling2d_6 (MaxPooling2D) (None, 16, 16, 64) 0
## ________________________________________________________________________________
## conv2d_5 (Conv2D) (None, 16, 16, 128) 73856
## ________________________________________________________________________________
## max_pooling2d_5 (MaxPooling2D) (None, 8, 8, 128) 0
## ________________________________________________________________________________
## conv2d_4 (Conv2D) (None, 8, 8, 256) 295168
## ________________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D) (None, 4, 4, 256) 0
## ________________________________________________________________________________
## flatten_1 (Flatten) (None, 4096) 0
## ________________________________________________________________________________
## dense_3 (Dense) (None, 128) 524416
## ________________________________________________________________________________
## dense_2 (Dense) (None, 64) 8256
## ________________________________________________________________________________
## dropout_1 (Dropout) (None, 64) 0
## ________________________________________________________________________________
## Output (Dense) (None, 3) 195
## ================================================================================
## Total params: 925,475
## Trainable params: 925,475
## Non-trainable params: 0
## ________________________________________________________________________________
We use the same adam optimizer and categorical_crossentropy. And we still use the same epoch of 25 and run.
model_tune2%>%
compile(
loss = "categorical_crossentropy",
optimizer = optimizer_adam(),
metrics = "accuracy"
)
# Fit data into model
history_tune2 <- model_tune2 %>%
fit(
# training data
train_image_array_gen,
# training epochs
steps_per_epoch = as.integer(train_samples/batch_size),
epochs = 25,
# validation data
validation_data = val_image_array_gen,
validation_steps = as.integer(valid_samples/batch_size),
verbose = 2
)
plot(history_tune2)## `geom_smooth()` using formula 'y ~ x'
## Model Evaluation
pred_val <- predict_classes(model_tune2, val_x)
head(pred_val, 10)## [1] 0 0 0 0 0 0 0 0 0 0
# Convert encoding to label
decode <- function(x){
case_when(x == 0 ~ "beach",
x == 1 ~ "forest",
x == 2 ~ "mountain"
)
}
pred_val <- sapply(pred_val, decode)
head(pred_val)## [1] "beach" "beach" "beach" "beach" "beach" "beach"
confusionMatrix(as.factor(pred_val),
as.factor(val_data$class)
)## Confusion Matrix and Statistics
##
## Reference
## Prediction beach forest mountain
## beach 84 0 5
## forest 0 89 6
## mountain 12 5 88
##
## Overall Statistics
##
## Accuracy : 0.9031
## 95% CI : (0.863, 0.9347)
## No Information Rate : 0.3426
## P-Value [Acc > NIR] : < 0.00000000000000022
##
## Kappa : 0.8546
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: beach Class: forest Class: mountain
## Sensitivity 0.8750 0.9468 0.8889
## Specificity 0.9741 0.9692 0.9105
## Pos Pred Value 0.9438 0.9368 0.8381
## Neg Pred Value 0.9400 0.9742 0.9402
## Prevalence 0.3322 0.3253 0.3426
## Detection Rate 0.2907 0.3080 0.3045
## Detection Prevalence 0.3080 0.3287 0.3633
## Balanced Accuracy 0.9245 0.9580 0.8997
The accuracy is for beach is increasing but decreasing for the other classes.
Our goal is to classify images whether it is beach, forest, or mountain. In this case, model performance have achieved the goal with pretty high accuracy and other metric above 80%. It means that this problem to classify images can be solved by Convolutional Neural Network. The Convolutional Neural Network itself is commonly used for computer vision/image recognition. Nowadays researchers are using CNN for OCR, object detection for self-driving cars, face recognition for social media, financial quantitative investment, medical imaging and many more.