1. Business Question

This LBB is about solving problems with unstructured data from a collection of images. The data consists of images with 3 different labels: “Iris-setosa”, “Iris-versicolour”, or “Iris-virginica”. Data were collected by scraping images directly from Kaggle.

Through this dataset, we are expected to solve an image classification problem by building a model that can extract information from images and give the correct label. We do implement deep learning model that is very good at dealing with unstructured data such as texts and images.

Image classification is pretty beneficial in many fields. In social media, a face recognition system will automatically detect your face and tag your friend if they are present in your posts. In wildlife conservation, image classification will help the researcher to label images based on the animal presence in the camera-trap image. In this case, you will build image classification for helping a stock photo website categorizing their image database based on the thematic location. Why is this an important task? You can check how the unsplash, a photo stock website that use deep learning to organize and create tag for each image in their collection.

Using this dataset, make a prediction model to classify the place captured from an image using collection of images inside the train folder. Submit your prediction for images located in the test folder. Make prediction to classify whether the image is about a “Iris-setosa”, “Iris-versicolour”, or “Iris-virginica”.

2. Import Library and Read Data

2.1. Import Library

We use the library() function to import the library packages.

# Data wrangling
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.5     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.0.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Image manipulation
library(imager)
## Loading required package: magrittr
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
## 
##     add
## The following object is masked from 'package:stringr':
## 
##     boundary
## The following object is masked from 'package:tidyr':
## 
##     fill
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image
# Deep learning
library(keras)

# Model Evaluation
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(keras)

# use your conda environment
use_condaenv("r-tensorflow")

2.2. Read Data

2.2.1. Directory data

Let’s try to get the file name of each image. First, we need to locate the folder of each target class. The following code will give you the folder name inside the data_input folder.

# directory data
folder_list <- list.files("data_input/")

folder_list
## [1] "iris-setosa"      "iris-versicolour" "iris-virginica"

In the data_input folder, there are folder beach, forest and mountain

We combine the folder name with the path or directory of the data_input folder in order to access the content inside each folder.

# directory data
folder_path <- paste0("data_input/", folder_list, "/")

folder_path
## [1] "data_input/iris-setosa/"      "data_input/iris-versicolour/"
## [3] "data_input/iris-virginica/"

2.2.2. Get file name

We will use the map() function to loop or iterate and collect the file name for each folder (beach, forest, mountain). The map() will return a list so if we want to combine the file name from 3 different folders we simply use the unlist() function. We can use the head() function to check the first 6 images.

# Get file name
file_name <- map(folder_path, 
                 function(x) paste0(x, list.files(x))
                 ) %>% 
  unlist()

# first 6 file name
head(file_name)
## [1] "data_input/iris-setosa/iris-01ab65973fd487a6cee4c5af1551c42b264eec5abab46bffd7c307ffef647e11.jpg"
## [2] "data_input/iris-setosa/iris-0797945218a97d6e5251b4758a2ba1b418cbd52ce4ef46a3239e4b939bd9807b.jpg"
## [3] "data_input/iris-setosa/iris-0c826b6f4648edf507e0cafdab53712bb6fd1f04dab453cee8db774a728dd640.jpg"
## [4] "data_input/iris-setosa/iris-0ff5ba898a0ec179a25ca217af45374fdd06d606bb85fc29294291facad1776a.jpg"
## [5] "data_input/iris-setosa/iris-1289c57b571e8e98e4feb3e18a890130adc145b971b7e208a6ce5bad945b4a5a.jpg"
## [6] "data_input/iris-setosa/iris-16f7515e1d6aa6d7dd3af4bca38c8065bfab9d426c5fd75b3c4bc51d737fb9d0.jpg"

We can also check the last 6 images.

# last 6 file name
tail(file_name)
## [1] "data_input/iris-virginica/iris-cf5babcededb7088a8c809a8547729f3e2af1cf9fca9903fac3ab43dbb6f43a1.jpg"
## [2] "data_input/iris-virginica/iris-d99d5fd2de5be1419cbd569570dbb6c9a6c8ec4f0a1ff5b55dc2607f6ecdca8f.jpg"
## [3] "data_input/iris-virginica/iris-db756cf8db2d8da5fd7604d0955e65da8f633d1504fa1ab9e027f4e270bae17a.jpg"
## [4] "data_input/iris-virginica/iris-deb0300afea7ed2f19c52a5242f88177cfc7459c33c4c8cc583c313fa188d131.jpg"
## [5] "data_input/iris-virginica/iris-e4c769f972daaff8a78034bb7893052b0bfbc8252e9cf94e5e87a81f65912437.jpg"
## [6] "data_input/iris-virginica/iris-e8d3fd862aae1c005bcc80a73fd34b9e683634933563e7538b520f26fd315478.jpg"

2.2.3. Cek Length data

Let’s check how many images we have. We can use the length() or summary() function to check that.

#cek folder data
# length(file_name)

summary(file_name)
##    Length     Class      Mode 
##       201 character character

The data have length 201 with character class.

3. Explanatory Data Analysis

3.1. Randomly Select 6 Image

To check the content of the file, we can use the load.image() function from the imager package. For example, let’s randomly visualize 6 images from the data.

# Randomly select image
set.seed(99)
sample_image <- sample(file_name, 6)

# Load image into R
img <- map(sample_image, load.image)

# Plot image
par(mfrow = c(2, 3)) # Create 2 x 3 image grid
map(img, plot)

## [[1]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
## 
## [[2]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
## 
## [[3]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
## 
## [[4]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
## 
## [[5]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
## 
## [[6]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3
# [[1]]
# Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
# 
# [[2]]
# Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
# 
# [[3]]
# Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
# 
# [[4]]
# Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
# 
# [[5]]
# Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
# 
# [[6]]
# Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 

3.2. Check Image Dimension

3.2.1. Explored the distribution of the image dimensions (height and width)

One of important aspects of image classification is understand the dimension of the input images. We need to know the distribution of the image dimension to create a proper input dimension for building the deep learning model. Let’s check the properties of the first image.

# Full Image Description
img <- load.image(file_name[1])
img
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3

We can get the information about the dimension of the image. The height and width represent the height and width of the image in pixels. The color channel represent if the color is in grayscale format (color channels = 1) or is in RGB format (color channels = 3). To get the value of each dimension, we can use the dim() function. It will return the height, width, depth, and the channels.

# Image Dimension
dim(img)
## [1] 256 256   1   3

So we have successfully insert an image and get the image dimensions. On the following code, we will create a function that will instantly get the height and width of an image and convert it into a data.frame.

# Function for acquiring width and height of an image
get_dim <- function(x){
  img <- load.image(x) 
  
  df_img <- data.frame(height = height(img),
                       width = width(img),
                       filename = x
                       )
  
  return(df_img)
}

get_dim(file_name[1])

Now we will sampling 400 images from the file name and get the height and width of the image. We use sampling here because it will take a quite long time to load all images.

# Randomly get 200 sample images
set.seed(123)
sample_file <- sample(file_name, 200)

# Run the get_dim() function for each image
file_dim <- map_df(sample_file, get_dim)

head(file_dim, 10)

Now let’s get the statistics for the image dimensions.

summary(file_dim)
##      height        width       filename        
##  Min.   :256   Min.   :256   Length:200        
##  1st Qu.:256   1st Qu.:256   Class :character  
##  Median :256   Median :256   Mode  :character  
##  Mean   :256   Mean   :256                     
##  3rd Qu.:256   3rd Qu.:256                     
##  Max.   :256   Max.   :256

The image dataset have a varying dimensions of the image data are such as : - For height, maximum and minimum dimensions are 256.0 and 256.0 - For width, maximum and minimum dimensions are 256.0 and 256.0

We should concerned about the dimension of the image data, because understanding the dimension of the image will help us on the next part of the process: data preprocessing.

4. Data Preprocessing

4.1. Demonstrated how to do image augmentation dataset with image generator.

Based on our previous summary of the image dimensions, we can determine the input dimension for the deep learning model. All input images should have the same dimensions. Here, we can determine the input size for the image, for example transform all image into 64 x 64 pixels. This process will be similar to us resizing the image. Bigger dimensions will have more features but will also take longer time to train. However, if the image size is too small, we will lose a lot of information from the data. So balancing this trade-off is the art of data preprocessing in image classification.

We also set the batch size for the data so the model will be updated every time it finished training on a single batch. Here, we set the batch size to 32.

# Desired height and width of images
target_size <- c(64, 64)

# Batch size for training the model
batch_size <- 32

Since we have a little amount of training set, we will build artificial data using method called Image Augmentation. Image augmentation is one useful technique in building models that can increase the size of the training set without acquiring new images. The goal is that to teach the model not only with the original image but also the modification of the image, such as flipping the image, rotate it, zooming, crop the image, etc. This will create more robust model. We can do data augmentation by using the image data generator from keras.

To do image augmentation, we can fit the data into a generator. Here, we will create the image generator for keras with the following properties:

  • Scaling the pixel value by dividing the pixel value by 255
  • Flip the image horizontally
  • Flip the image vertically
  • Rotate the image from 0 to 45 degrees
  • Zoom in or zoom out by 25% (zoom 75% or 125%)
  • Use 20% of the data as validation dataset
# Image Generator
train_data_gen <- image_data_generator(rescale = 1./255, # Scaling pixel value
                                       horizontal_flip = T, # Flip image horizontally
                                       vertical_flip = T, # Flip image vertically 
                                       rotation_range = 45, # Rotate image from 0 to 45 degrees
                                       zoom_range = 0.25, # Zoom in or zoom out range
                                       validation_split = 0.2 # 20% data as validation data
                                       )
## Loaded Tensorflow version 2.0.0

We do resize the image with the input image dimensions are 64 x 64. We do scale/normalize the image by using parameter rescale in image_data_generator() function. We do rotate or flip the image by using parameter : horizontal_flip, vertical_flip, and rotation_range in image_data_generator() function. We use RGB at parameter color mode in flow_images_from_directory() function.

4.2. Explored the label/class distribution of the target variable

Now we can insert our image data into the generator using the flow_images_from_directory(). The data is located inside the data_input folder, so the directory will be data_input/. From this process, we will get the augmented image both for training data and the validation data.

# Training Dataset - 80%
train_image_array_gen <- flow_images_from_directory(directory = "data_input/", # Folder of the data
                                                    target_size = target_size, # target of the image dimension (200x200)
                                                    color_mode = "rgb", # use RGB color
                                                    batch_size = batch_size, 
                                                    seed = 123,  # set random seed
                                                    subset = "training", # declare that this is for training data
                                                    generator = train_data_gen
                                                    )

# Validation Dataset - 20%
val_image_array_gen <- flow_images_from_directory(directory = "data_input/",
                                                  target_size = target_size, 
                                                  color_mode = "rgb", 
                                                  batch_size = batch_size,
                                                  seed = 123,
                                                  subset = "validation", # declare that this is the validation data
                                                  generator = train_data_gen
                                                  )

We must make sure that the proportion of class of target variable in each label (Iris-setosa, Iris-versicolour, Iris-virginica) is already balanced. If they are not balance, the effect are like on below :
- There is not equal or close to equal in the class distribution - There is instead biased or skewed.

5. Model Fitting and Evaluation

5.1. Demonstrated how to prepare cross-validation data for this case

Here we will collect some information from the generator and check the class proportion of the train dataset. The index correspond to each labels of the target variable and ordered alphabetically (Iris-setosa, Iris-versicolour, Iris-virginica).

# Number of training samples
train_samples <- train_image_array_gen$n

# Number of validation samples
valid_samples <- val_image_array_gen$n

# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)

# Get the class proportion
table("\nFrequency" = factor(train_image_array_gen$classes)
      ) %>% 
  prop.table()
## 
## Frequency
##         0         1         2 
## 0.3333333 0.3209877 0.3456790

The proportion of the training vs validation dataset are 80% and 20%. We need to divide the data to get the unseen data that is not covered by data training.

5.2. Demonstrated how to build deep learning architecture

5.2.1. Convolutional Neural Network

The Convolutional Neural Network or Convolutional Layer is a popular layer for image classification. If you remember, an image is just a 2 dimensional array with certain height and width. For example, an image with 64 x 64 pixels means that it has 4096 pixels that is distributed in a 64 x 64 array instead of a single dimensional vector. The benefit of using image as a 2D array is that we can extract certain features from the image such as the shape of nose, the shape of eyes, hand, etc.

5.2.2. Model Architecture

We can start building the model architecture for the deep learning. We will build a simple model first with the following layer:

  • Convolutional layer to extract features from 2D image with relu activation function
  • Max Pooling layer to downsample the image features
  • Flattening layer to flatten data from 2D array to 1D array
  • Dense layer to capture more information
  • Dense layer for output with softmax activation function

Don’t forget to set the input size in the first layer. If the input image is in RGB, set the final number to 3, which is the number of color channels. If the input image is in grayscale, set the final number to 1.

# input shape of the image
c(target_size, 3) 
## [1] 64 64  3
# Set Initial Random Weight
tensorflow::tf$random$set_seed(123)

model <- keras_model_sequential(name = "simple_model") %>% 
  
  # Convolution Layer
  layer_conv_2d(filters = 16,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = c(target_size, 3) 
                ) %>% 

  # Max Pooling Layer
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  # Flattening Layer
  layer_flatten() %>% 
  
  # Dense Layer
  layer_dense(units = 16,
              activation = "relu") %>% # data image
  
  # Output Layer
  layer_dense(units = output_n,
              activation = "softmax",#klasifikasi multiclass
              name = "Output")
  
model
## Model
## Model: "simple_model"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d (Conv2D)                     (None, 64, 64, 16)              448         
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D)        (None, 32, 32, 16)              0           
## ________________________________________________________________________________
## flatten (Flatten)                   (None, 16384)                   0           
## ________________________________________________________________________________
## dense (Dense)                       (None, 16)                      262160      
## ________________________________________________________________________________
## Output (Dense)                      (None, 3)                       51          
## ================================================================================
## Total params: 262,659
## Trainable params: 262,659
## Non-trainable params: 0
## ________________________________________________________________________________

As the result on above, we start by entering image data with 64 x 64 pixels into the convolutional layer, which has 16 filters to extract featuers from the image. We then downsample or only take the maximum value for each 2x2 pooling area so the data now only has 32 x 32 pixels with from 16 filters. After that, from 32 x 32 pixels we flatten the 2D array into a 1D array with 32 x 32 x 16 = 16384 nodes.We can further extract information using the simple dense layer and finished by flowing the information into the output layer, which will be transformed using the softmax activation function to get the probability of each class as the output.

We use convolutional layer (CNN) to buil model with 5 CNN Layer. We used the flatten layer to flatten the input, without affecting the batch size. A Flatten layer flattens each batch in the inputs to 1-dimension. And the activation for the output layer is by softmax.

5.3. Demonstrated how to properly do model fitting and evaluation.

We can start fitting the data into the model. Don’t forget to compile the model by specifying the loss function and the optimizer. For starter, we will use 10 epochs to train the data. For multilabel classification, we will use categorical cross-entropy as the loss function. For this example, we use optimizer adam with learning rate of 0.01. We will also evaluate the model with the validation data from the generator.

model %>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_adam(lr = 0.01),
    metrics = "accuracy"
  )
# # Fit data into model
# history <- model %>% 
#   fit(
#   # training data
#   train_image_array_gen,
# 
#   # training epochs
#   steps_per_epoch = as.integer(train_samples / batch_size), 
#   epochs = 10, 
#   
#   # validation data
#   validation_data = val_image_array_gen,
#   validation_steps = as.integer(valid_samples / batch_size)
# )
# 
# plot(history)

To fit the model, we are using categorical_crossentropy at loss function and optimizer_adam(lr = 0.01) with epochs 10 to train the model.

5.4. Demonstrated how to properly do model selection by comparing models or making adjustment to single model.

Now we will further evaluate and acquire the confusion matrix using the validation data from the generator. First, we need to acquire the file name of the image that is used as the data validation. From the file name, we will extract the categorical label as the actual value of the target variable.

val_data <- data.frame(file_name = paste0("data_input/", val_image_array_gen$filenames)) %>% 
  mutate(class = str_extract(file_name, "iris-setosa|iris-versicolour|iris-virginica"))

head(val_data, 10)

What to do next? We need to get the image into R by converting the image into an array. Since our input dimension for CNN model is image with 64 x 64 pixels with 3 color channels (RGB), we will do the same with the image of the testing data. The reason of using array is that we want to predict the original image fresh from the folder so we will not use the image generator since it will transform the image and does not reflect the actual image.

# Function to convert image to array
image_prep <- function(x) {
  arrays <- lapply(x, function(path) {
    img <- image_load(path, target_size = target_size, 
                      grayscale = F # Set FALSE if image is RGB
                      )
    
    x <- image_to_array(img)
    x <- array_reshape(x, c(1, dim(x)))
    x <- x/255 # rescale image pixel
  })
  do.call(abind::abind, c(arrays, list(along = 1)))
}
# `image prep ()` function to get testing data set and assigned to object name `test_x`
test_x <- image_prep(val_data$file_name)

# Check dimension of testing data set
dim(test_x)
## [1] 39 64 64  3

The validation data consists of 39 images with dimensions of 64 x 64 pixels and 3 color channels (RGB). After we have prepared the data test, we now can proceed to predict the label of each image using our CNN model.

# `predict_classes()` function
pred_test <- predict_classes(model, test_x) 

head(pred_test, 10)
##  [1] 1 1 1 1 1 1 1 1 1 1

To get easier interpretation of the prediction, we will convert the encoding into proper class label.

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "iris-setosa",
            x == 1 ~ "iris-versicolour",
            x == 2 ~ "iris-virginica"
            )
}

pred_test <- sapply(pred_test, decode) 

head(pred_test, 10)
##  [1] "iris-versicolour" "iris-versicolour" "iris-versicolour" "iris-versicolour"
##  [5] "iris-versicolour" "iris-versicolour" "iris-versicolour" "iris-versicolour"
##  [9] "iris-versicolour" "iris-versicolour"

Finally, we evaluate the model using the confusion matrix. The model perform very poorly with low accuracy. We will tune the model by improving the model architecture.

# confusion matrix
confusionMatrix(as.factor(pred_test), 
                as.factor(val_data$class)
                )
## Confusion Matrix and Statistics
## 
##                   Reference
## Prediction         iris-setosa iris-versicolour iris-virginica
##   iris-setosa                0                0              0
##   iris-versicolour          13               11             14
##   iris-virginica             0                1              0
## 
## Overall Statistics
##                                         
##                Accuracy : 0.2821        
##                  95% CI : (0.15, 0.4487)
##     No Information Rate : 0.359         
##     P-Value [Acc > NIR] : 0.8801        
##                                         
##                   Kappa : -0.039        
##                                         
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: iris-setosa Class: iris-versicolour
## Sensitivity                      0.0000                  0.9167
## Specificity                      1.0000                  0.0000
## Pos Pred Value                      NaN                  0.2895
## Neg Pred Value                   0.6667                  0.0000
## Prevalence                       0.3333                  0.3077
## Detection Rate                   0.0000                  0.2821
## Detection Prevalence             0.0000                  0.9744
## Balanced Accuracy                0.5000                  0.4583
##                      Class: iris-virginica
## Sensitivity                        0.00000
## Specificity                        0.96000
## Pos Pred Value                     0.00000
## Neg Pred Value                     0.63158
## Prevalence                         0.35897
## Detection Rate                     0.00000
## Detection Prevalence               0.02564
## Balanced Accuracy                  0.48000

The model is overfit and we use the confusion matrix function to find the metric accuracy that is considered the most important on this case. We increase unit, filter amount and the layer to improve the model performance.

6. Tuning the Model

6.1. Model Architecture

Let’s look back at our model architecture. If you have noticed, we can actually extract more information while the data is still in an 2D image array. The first CNN only extract the general features of our image and then being downsampled using the max pooling layer. Even after pooling, we still have 32 x 32 array that has a lot of information to extract before flattening the data. Therefore, we can stacks more CNN layers into the model so there will be more information to be captured. We can also put 2 CNN layers consecutively before doing max pooling.

model
## Model
## Model: "simple_model"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d (Conv2D)                     (None, 64, 64, 16)              448         
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D)        (None, 32, 32, 16)              0           
## ________________________________________________________________________________
## flatten (Flatten)                   (None, 16384)                   0           
## ________________________________________________________________________________
## dense (Dense)                       (None, 16)                      262160      
## ________________________________________________________________________________
## Output (Dense)                      (None, 3)                       51          
## ================================================================================
## Total params: 262,659
## Trainable params: 262,659
## Non-trainable params: 0
## ________________________________________________________________________________

The following is our improved model architecture:

  • 1st Convolutional layer to extract features from 2D image with relu activation function
  • 2nd Convolutional layer to extract features from 2D image with relu activation function
  • Max pooling layer
  • 3rd Convolutional layer to extract features from 2D image with relu activation function
  • Max pooling layer
  • 4th Convolutional layer to extract features from 2D image with relu activation function
  • Max pooling layer
  • 5th Convolutional layer to extract features from 2D image with relu activation function
  • Max pooling layer
  • Flattening layer from 2D array to 1D array
  • Dense layer to capture more information
  • Dense layer for output layer

You can play and get creative by designing your own model architecture.

tensorflow::tf$random$set_seed(123)

model_big <- keras_model_sequential() %>% 
  
  # First convolutional layer
  layer_conv_2d(filters = 32, # < target size
                kernel_size = c(5,5), # 5 x 5 filters
                padding = "same",
                activation = "relu",
                input_shape = c(target_size, 3)
                ) %>% 
  
  # # Second convolutional layer
  # layer_conv_2d(filters = 64,
  #       kernel_size = c(3,3), # 3 x 3 filters
  #       padding = "same",
  #       activation = "relu"
  #     ) %>% 
  
  # Max pooling layer
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # # Third convolutional layer
  # layer_conv_2d(filters = 8,
  #           kernel_size = c(3,3),
  #           padding = "same",
  #           activation = "relu"
  #           ) %>% 
  # 
  # # Max pooling layer
  # layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # # Fourth convolutional layer
  # layer_conv_2d(filters = 32,
  #            kernel_size = c(5,5),
  #            padding = "same",
  #            activation = "relu"
  #            ) %>% 
  # 
  # # Max pooling layer
  # layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # # Fifth convolutional layer
  # layer_conv_2d(filters = 32,
  #              kernel_size = c(3,3),
  #              padding = "same",
  #              activation = "relu"
  #              ) %>% 
  # 
  # # Max pooling layer
  # layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # Flattening layer
  layer_flatten() %>% 
  
  # First Dense layer
  layer_dense(units = 16,
              activation = "relu") %>% 
  
  # # Second Dense layer
  # layer_dense(units = 16,
  #          activation = "relu") %>%
  
  # # Third Dense layer
  # layer_dense(units = 16,
  #        activation = "relu") %>%
  
  # # Fourth Dense layer
  # layer_dense(units = 32,
  #         activation = "relu") %>%
  #  
  # # Fifth Dense layer
  # layer_dense(units = 32,
  #           activation = "relu") %>%
  
  # Output layer
  layer_dense(name = "Output",
              units = 3, 
              activation = "softmax")

model_big
## Model
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d_1 (Conv2D)                   (None, 64, 64, 32)              2432        
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)      (None, 32, 32, 32)              0           
## ________________________________________________________________________________
## flatten_1 (Flatten)                 (None, 32768)                   0           
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 16)                      524304      
## ________________________________________________________________________________
## Output (Dense)                      (None, 3)                       51          
## ================================================================================
## Total params: 526,787
## Trainable params: 526,787
## Non-trainable params: 0
## ________________________________________________________________________________

6.2. Model Fitting

We can once again fit the model into the data. We will let the data train with more epochs since we have small numbers of data. For example, we will train the data with 10 epochs. We still use the learning rate 0.01.

# Make compile model
model_big %>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_adam(lr = 0.01),
    metrics = "accuracy"
  )
# # Make plot
# history <- model %>% 
#   fit_generator(
#   # training data
#   train_image_array_gen,
#   
#   # epochs
#   steps_per_epoch = as.integer(train_samples / batch_size), 
#   epochs = 10, 
#   
#   # validation data
#   validation_data = val_image_array_gen,
#   validation_steps = as.integer(valid_samples / batch_size),
#   
#   # print progress but don't create graphic
#   verbose = 1,
#   view_metrics = 0
# )
# 
# plot(history)

6.3. Model Evaluation

Now we will further evaluate the data and acquire the confusion matrix for the validation data.

pred_test1 <- predict_classes(model_big, test_x) 

head(pred_test1, 10)
##  [1] 0 0 0 2 2 0 0 2 1 2

To get easier interpretation of the prediction, we will convert the encoding into proper class label.

# Convert encoding to label
decode1 <- function(x){
  case_when(x == 0 ~ "iris-setosa",
            x == 1 ~ "iris-versicolour",
            x == 2 ~ "iris-virginica"
            )
}

pred_test1 <- sapply(pred_test1, decode1) 

head(pred_test1, 10)
##  [1] "iris-setosa"      "iris-setosa"      "iris-setosa"      "iris-virginica"  
##  [5] "iris-virginica"   "iris-setosa"      "iris-setosa"      "iris-virginica"  
##  [9] "iris-versicolour" "iris-virginica"

Finally, we evaluate the model using the confusion matrix. This model perform better than the previous model because we put more CNN layer to extract more features from the image.

confusionMatrix(as.factor(pred_test1), 
                as.factor(val_data$class)
                )
## Confusion Matrix and Statistics
## 
##                   Reference
## Prediction         iris-setosa iris-versicolour iris-virginica
##   iris-setosa                7                5              4
##   iris-versicolour           1                2              2
##   iris-virginica             5                5              8
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4359          
##                  95% CI : (0.2781, 0.6038)
##     No Information Rate : 0.359           
##     P-Value [Acc > NIR] : 0.2007          
##                                           
##                   Kappa : 0.1429          
##                                           
##  Mcnemar's Test P-Value : 0.2547          
## 
## Statistics by Class:
## 
##                      Class: iris-setosa Class: iris-versicolour
## Sensitivity                      0.5385                 0.16667
## Specificity                      0.6538                 0.88889
## Pos Pred Value                   0.4375                 0.40000
## Neg Pred Value                   0.7391                 0.70588
## Prevalence                       0.3333                 0.30769
## Detection Rate                   0.1795                 0.05128
## Detection Prevalence             0.4103                 0.12821
## Balanced Accuracy                0.5962                 0.52778
##                      Class: iris-virginica
## Sensitivity                         0.5714
## Specificity                         0.6000
## Pos Pred Value                      0.4444
## Neg Pred Value                      0.7143
## Prevalence                          0.3590
## Detection Rate                      0.2051
## Detection Prevalence                0.4615
## Balanced Accuracy                   0.5857

7. Predict Data in Testing Dataset

After we have trained the model and if you have satisfied with the model performance on the validation dataset, we will do another model evaluation using the testing dataset. The testing data is located on the data_test folder.

7.1. Directory data

# directory data
folder_list_test <- list.files("data_test/")

head(folder_list_test)
## [1] "iris-setosa"      "iris-versicolour" "iris-virginica"

In the train folder, there are folder iris-setosa, iris-versicolour and iris-virginica

We combine the folder name with the path or directory of the data_test folder in order to access the content inside each folder.

# directory data
folder_path_test <- paste0("data_test/", folder_list_test, "/")

head(folder_path_test)
## [1] "data_test/iris-setosa/"      "data_test/iris-versicolour/"
## [3] "data_test/iris-virginica/"

7.2. Get file name

We will use the map() function to loop or iterate and collect the file name for each file. The map() will return a list so if we want to combine the file name from the path folder we simply use the unlist() function. We can use the head() function to check the first 6 images.

# Get file name
file_name_test <- map(folder_path_test, 
                 function(x) paste0(x, list.files(x))
                 ) %>% 
  unlist()

# first 6 file name
head(file_name_test)
## [1] "data_test/iris-setosa/iris-1289c57b571e8e98e4feb3e18a890130adc145b971b7e208a6ce5bad945b4a5a.jpg"
## [2] "data_test/iris-setosa/iris-1f941001f508ff1bd492457a90da64e52c461bfd64587a3cf7c6bf1bcb35adab.jpg"
## [3] "data_test/iris-setosa/iris-20f5f654ae5fbcc405b465ce257c187f81eb5fc070531f940be42f1424c3fb44.jpg"
## [4] "data_test/iris-setosa/iris-2abfd90b157f1bc4170c24cc8c258d776a58926f0efca787961210f60bce76be.jpg"
## [5] "data_test/iris-setosa/iris-332953f4d6a355ca189e2508164b24360fc69f83304e7384ca2203ddcb7c73b5.jpg"
## [6] "data_test/iris-setosa/iris-40d5d5b3aacd405930c5e03689455ec3001e6601daad468e28b3e65126b404ab.jpg"

We create the new data frame with 1 new column id and assign to object df_test.

df_test <- data.frame(id =file_name_test)

df_test

Then, we convert the image into 2D array.

test_x1 <- image_prep(df_test$id)

# Check dimension of testing data set
dim(test_x1)
## [1] 45 64 64  3

The testing data consists of 45 images with dimension of 64 x 64 pixels and 3 color channels (RGB). After we have prepared the data test, we now can proceed to predict the label of each image using our CNN model.

# predict the label of each image
pred_test2 <- predict_classes(model_big, test_x1) 

head(pred_test2, 10)
##  [1] 2 2 1 0 0 2 0 0 2 2

To get easier interpretation of the prediction, we will convert the encoding into proper class label.

# Convert encoding to label
decode2 <- function(x){
  case_when(x == 0 ~ "iris-setosa",
            x == 1 ~ "iris-versicolour",
            x == 2 ~ "iris-virginica"
            )
}

pred_test2 <- sapply(pred_test2, decode2) 

head(pred_test2, 10)
##  [1] "iris-virginica"   "iris-virginica"   "iris-versicolour" "iris-setosa"     
##  [5] "iris-setosa"      "iris-virginica"   "iris-setosa"      "iris-setosa"     
##  [9] "iris-virginica"   "iris-virginica"

Read data image_data_test.csv and create label new column from object pred_test2. And then assign into object image_data_test.

image_data_test <- read.csv("image_data_test.csv") %>% 
  mutate(label=pred_test2)

We save the dataframe into file image_test.csv.

# save the data frame into new file csv
write.csv(x = image_data_test, file = "image_test.csv", row.names = FALSE)

We do check the new file above.

# check the new file
pred_image_test<-read.csv("image_test.csv")
head(pred_image_test)

In the model performance above, we have not met the appropriate model performance (>75%) that can solve the business problem.

8. Conclusion

My goal has not been achieved. The problem should be solved by machine learning, but we have not met the appropriate model performance. The higher accuracy that we find is 0.4359 with the setting model are such as: - target_size = 64 x 64 - batch_size = 32 - loss = “categorical_crossentropy” - optimizer = optimizer_adam(lr = 0.01) - layer = 5 - filters = 32 in convolutional layer - units = 16 in dense layer.