Introduction

Image Classification

Image classification is pretty beneficial in many fields. In social media, a face recognition system will automatically detect your face and tag your friend if they are present in your posts. In wildlife conservation, image classification will help researcher to label image based on the animal presence in the camera-trap image. In this case, we will build image classification for helping a stock photo website categorizing its image database based on the thematic location.

Dataset

“Where Were You” is a challenge for solving problems with unstructured data from a collection of images. The data consists of images with 3 different labels: Beach, Forest, or Mountain. Data were collected by scraping images directly from Google image search. Through this dataset is expected to solve an image classification problem by building a model that can extract information from images and give the correct label. Using this dataset, we are going to make a prediction model to classify whether the image is about or a Beach, a Forest, a Mountain.

Import Library

# Data wrangling
library(tidyverse)

# Image manipulation
library(imager)

# Deep learning
library(keras)

# Model Evaluation
library(caret)

options(scipen = 999)

Data Preprocess and Explanatory Data Analysis

Let’s explore the data first before building the model. In image classification problem, it is a common practice to put each image on separate folders based on the target class/labels. For example, inside the train folder in our data, we have 3 different folders, respectively for Beach, Forest, and Mountain.

Now let’s get the file name of each image. First, we need to locate the folder of each target class. The following code will give the folder name inside the train folder.

folder_list <- list.files("Capstone ML Data/7. where-were-you-cl/data/train/")

folder_list

## [1] "beach"    "forest"   "mountain"

We combine the folder name with the path or directory of the train folder in order to access the content inside each folder.

folder_path <- paste0("Capstone ML Data/7. where-were-you-cl/data/train/", folder_list, "/")

folder_path

## [1] "Capstone ML Data/7. where-were-you-cl/data/train/beach/"   
## [2] "Capstone ML Data/7. where-were-you-cl/data/train/forest/"  
## [3] "Capstone ML Data/7. where-were-you-cl/data/train/mountain/"

We will use the map() function to loop or iterate and collect the file name for each folder (Beach, Forest, and Mountain). The map() will return a list so if we want to combine the file name from 3 different folders we simply use the unlist() function.

# Get file name
file_name <- map(folder_path, 
                 function(x) paste0(x, list.files(x))
                 ) %>% 
  unlist()

# first 6 file name
head(file_name)

## [1] "Capstone ML Data/7. where-were-you-cl/data/train/beach/beach_100.jpeg"
## [2] "Capstone ML Data/7. where-were-you-cl/data/train/beach/beach_101.jpeg"
## [3] "Capstone ML Data/7. where-were-you-cl/data/train/beach/beach_102.jpeg"
## [4] "Capstone ML Data/7. where-were-you-cl/data/train/beach/beach_103.jpeg"
## [5] "Capstone ML Data/7. where-were-you-cl/data/train/beach/beach_104.jpeg"
## [6] "Capstone ML Data/7. where-were-you-cl/data/train/beach/beach_105.jpeg"

# last 6 file name
tail(file_name)

## [1] "Capstone ML Data/7. where-were-you-cl/data/train/mountain/mountain_a_45.jpeg"
## [2] "Capstone ML Data/7. where-were-you-cl/data/train/mountain/mountian_115.jpeg" 
## [3] "Capstone ML Data/7. where-were-you-cl/data/train/mountain/mountian_45.jpeg"  
## [4] "Capstone ML Data/7. where-were-you-cl/data/train/mountain/mountian_72.jpeg"  
## [5] "Capstone ML Data/7. where-were-you-cl/data/train/mountain/mountian_74.jpeg"  
## [6] "Capstone ML Data/7. where-were-you-cl/data/train/mountain/mountian_95.jpeg"

Let’s check how many images we have.

length(file_name)

## [1] 1328

To check the content of the file, we can use the load.image() function from the imager package. For example, let’s randomly visualize 6 images from the data.

# Randomly select image
set.seed(99)
sample_image <- sample(file_name, 6)

# Load image into R
img <- map(sample_image, load.image)

# Plot image
par(mfrow = c(2, 3)) # Create 2 x 3 image grid
map(img, plot)

## [[1]]
## Image. Width: 355 pix Height: 142 pix Depth: 1 Colour channels: 3 
## 
## [[2]]
## Image. Width: 318 pix Height: 159 pix Depth: 1 Colour channels: 3 
## 
## [[3]]
## Image. Width: 300 pix Height: 168 pix Depth: 1 Colour channels: 3 
## 
## [[4]]
## Image. Width: 305 pix Height: 165 pix Depth: 1 Colour channels: 3 
## 
## [[5]]
## Image. Width: 275 pix Height: 183 pix Depth: 1 Colour channels: 3 
## 
## [[6]]
## Image. Width: 285 pix Height: 177 pix Depth: 1 Colour channels: 3

Explored the distribution of the image dimensions

One of important aspects of image classification is understand the dimension of the input images. You need to know the distribution of the image dimension to create a proper input dimension for building the deep learning model.

To get the value of each dimension, we can use the dim() function. It will return the height, width, depth, and the channels. The height and width represent the height and width of the image in pixels. The color channel represent if the color is in grayscale format (color channels = 1) or is in RGB format (color channels = 3).

On the following code, we will create a function that will instantly get the height and width of an image and convert it into a data.frame.

# Function for acquiring width and height of an image
get_dim <- function(x){
  img <- load.image(x) 
  
  df_img <- data.frame(height = height(img),
                       width = width(img),
                       filename = x
                       )
  
  return(df_img)
}

Now we will get the height and width of all of the images and convert them into a data.frame.

# Run the get_dim() function for each image
file_dim <- map_df(file_name, get_dim)

head(file_dim, 10)

Now let’s get the statistics for the image dimensions.

summary(file_dim)

##      height          width         filename        
##  Min.   : 94.0   Min.   :100.0   Length:1328       
##  1st Qu.:168.0   1st Qu.:268.0   Class :character  
##  Median :183.0   Median :275.0   Mode  :character  
##  Mean   :178.2   Mean   :282.9                     
##  3rd Qu.:184.0   3rd Qu.:300.0                     
##  Max.   :314.0   Max.   :534.0

Demonstrated how to do image augmentation dataset with image generator

Based on our previous summary of the image dimensions, we can determine the input dimension for the deep learning model. All input images should have the same dimensions. Here, we can determine the input size for the image. For this case, we will transform all images into 128 x 128 pixels.

We also set the batch size for the data so the model will be updated every time it finished training on a single batch. Here, we set the batch size to 8.

# Desired height and width of images
target_size <- c(128, 128)

# Batch size for training the model
batch_size <- 8

Since we have a little amount of training set, we will build artificial data using method called Image Augmentation. Image augmentation is one useful technique in building models that can increase the size of the training set without acquiring new images. The goal is that to teach the model not only with the original image but also the modification of the image, such as flipping the image, rotate it, zooming, crop the image, etc. This will create more robust model. We can do data augmentation by using the image data generator from keras.

To do image augmentation, we can fit the data into a generator. Here, we will create the image generator for keras with the following properties:

Scaling the pixel value by dividing the pixel value by 255
Flip the image horizontally
Shift the image to the left or right by 20%
Shift the image to the up or down by 20%
Zoom in or zoom out by 20% (zoom 80% or 120%)
Brighter the images by range 1-2
Fill the empty pixel by the nearest pixel
Use 20% of the data as validation dataset

# Image Generator
train_data_gen <- image_data_generator(rescale = 1/255,
                                       horizontal_flip = T,
                                       width_shift_range = 0.2,
                                       height_shift_range = 0.2,
                                       zoom_range = 0.2,
                                       brightness_range = c(1,2),
                                       fill_mode = "nearest",
                                       validation_split = 0.2
                                       )

Now we can insert our image data into the generator using the flow_images_from_directory(). The data is located inside the data folder and inside the train folder. From this process, we will get the augmented image both for training data and the validation data.

# Training Dataset
train_image_array_gen <- flow_images_from_directory(directory = "Capstone ML Data/7. where-were-you-cl/data/train/",
                                                    target_size = target_size,
                                                    color_mode = "rgb",
                                                    batch_size = batch_size , 
                                                    seed = 123,
                                                    subset = "training",
                                                    generator = train_data_gen,
                                                    class_mode = "categorical"
                                                    )

# Validation Dataset
val_image_array_gen <- flow_images_from_directory(directory = "Capstone ML Data/7. where-were-you-cl/data/train/",
                                                  target_size = target_size, 
                                                  color_mode = "rgb", 
                                                  batch_size = batch_size ,
                                                  seed = 123,
                                                  subset = "validation",
                                                  generator = train_data_gen,
                                                  class_mode = "categorical"
                                                  )

Explored the label/class distribution of the target variable

Here we will collect some information from the generator and check the class proportion of the train dataset. The index correspond to each labels of the target variable and ordered alphabetically (Beach, Forest, and Mountain).

# Number of training samples
train_samples <- train_image_array_gen$n

# Number of validation samples
valid_samples <- val_image_array_gen$n

# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)

# Get the class proportion
table("\nFrequency" = factor(train_image_array_gen$classes)
      ) %>% 
  prop.table()

## 
## Frequency
##         0         1         2 
## 0.3214286 0.3045113 0.3740602

The propotion for each label : 0.32 for beach, 0.3 for forest and 0.37 for mountain.

There is a class imbalance, where mountain class have the greatest proportion of the others. But the difference is not that big so it can be tolerated. Class imbalance can cause the model to tend to be more biased towards the majority class, causing the minority class classification to be bad.

Model Fitting and Evaluation

Demonstrated how to prepare cross-validation data for this case

The dataset is divided by 80% for training and 20% for validation, because we need to train and evaluate the model before it is used to a real case. The train dataset will be used to train the model while the validation dataset is used for evaluate the model.

Demonstrated how to build deep learning architecture

We can start building the model architecture for the deep learning. We will build a simple model first with the following layer:

First Convolutional layer to extract features from 2D image with relu activation function and 32 filters
First Max Pooling layer to downsample the image features
Second Convolutional layer to extract features from 2D image with relu activation function and 64 filters
Second Max Pooling layer to downsample the image features
Third Convolutional layer to extract features from 2D image with relu activation function and 128 filters
Third Max Pooling layer to downsample the image features
Flattening layer to flatten data from 2D array to 1D array
Dense layer to capture more information
Dense layer for output with softmax activation function

Don’t forget to set the input size in the first layer. Because of the input image is in RGB, we set the final number to 3, which is the number of color channels. If the input image is in grayscale, set the final number to 1.

# Set Initial Random Weight
tensorflow::tf$random$set_seed(123)

model <- keras_model_sequential(name = "CNN_Model") %>% 
  
  # Convolution Layer 1
  layer_conv_2d(filters = 32,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = c(target_size, 3) 
                ) %>% 

  # Max Pooling Layer 1
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>% 
  
  # Convolution Layer 2
  layer_conv_2d(filters = 64,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>% 

  # Max Pooling Layer 2
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>% 
  
  # Convolution Layer 3
  layer_conv_2d(filters = 128,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>%

  # Max Pooling Layer 3
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>%
  
  # Flattening Layer
  layer_flatten() %>% 
  
  # Dense Layer1
  layer_dense(units = 128,
              activation = "relu") %>%

  # Output Layer
  layer_dense(units = output_n,
              activation = "softmax",
              name = "Output")
  
model

## Model
## Model: "CNN_Model"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  conv2d_2 (Conv2D)                  (None, 128, 128, 32)            896         
##                                                                                 
##  max_pooling2d_2 (MaxPooling2D)     (None, 64, 64, 32)              0           
##                                                                                 
##  conv2d_1 (Conv2D)                  (None, 64, 64, 64)              18496       
##                                                                                 
##  max_pooling2d_1 (MaxPooling2D)     (None, 32, 32, 64)              0           
##                                                                                 
##  conv2d (Conv2D)                    (None, 32, 32, 128)             73856       
##                                                                                 
##  max_pooling2d (MaxPooling2D)       (None, 16, 16, 128)             0           
##                                                                                 
##  flatten (Flatten)                  (None, 32768)                   0           
##                                                                                 
##  dense (Dense)                      (None, 128)                     4194432     
##                                                                                 
##  Output (Dense)                     (None, 3)                       387         
##                                                                                 
## ================================================================================
## Total params: 4,288,067
## Trainable params: 4,288,067
## Non-trainable params: 0
## ________________________________________________________________________________

Demonstrated how to properly do model fitting and evaluation

# Compile Model
model %>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_adam(learning_rate = 0.001),
    metrics = "accuracy"
  )

# Fit data into model
history <- model %>% 
  fit(
  # training data
  train_image_array_gen,

  # training epochs
  steps_per_epoch = as.integer(train_samples / batch_size), 
  epochs = 35, 
  
  # validation data
  validation_data = val_image_array_gen,
  validation_steps = as.integer(valid_samples / batch_size)
)

plot(history)

Model Evaluation

Now we will further evaluate and acquire the confusion matrix using the validation data from the generator. First, we need to acquire the file name of the image that is used as the data validation. From the file name, we will extract the categorical label as the actual value of the target variable.

val_data <- data.frame(file_name = paste0("Capstone ML Data/7. where-were-you-cl/data/train/",
                                          val_image_array_gen$filenames)) %>% 
  mutate(class = str_extract(file_name, "beach|forest|mountain"))

head(val_data, 10)

Since our input dimension for CNN model is image with 128 x 128 pixels with 3 color channels (RGB), we will do the same with the image of the validation data. The reason of using array is that we want to predict the original image fresh from the folder so we will not use the image generator since it will transform the image and does not reflect the actual image.

# Function to convert image to array
image_prep <- function(x) {
  arrays <- lapply(x, function(path) {
    img <- image_load(path, target_size = target_size, 
                      grayscale = F # Set FALSE if image is RGB
                      )
    
    x <- image_to_array(img)
    x <- array_reshape(x, c(1, dim(x)))
    x <- x/255 # rescale image pixel
  })
  do.call(abind::abind, c(arrays, list(along = 1)))
}

val_x <- image_prep(val_data$file_name)

# Check dimension of testing data set
dim(val_x)

## [1] 264 128 128   3

The validation data consists of 264 images with dimensions of 128 x 128 pixels and 3 color channels (RGB). After we have prepared the data test, we now can proceed to predict the label of each image using our CNN model.

pred_val <- predict(model, val_x) %>% k_argmax() %>% as.array()
head(pred_val, 10)

##  [1] 0 0 0 0 0 0 0 0 0 0

To get easier interpretation of the prediction, we will convert the encoding into proper class label.

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "beach",
            x == 1 ~ "forest",
            x == 2 ~ "mountain"
            )
}

pred_val <- sapply(pred_val, decode) 

head(pred_val, 10)

##  [1] "beach" "beach" "beach" "beach" "beach" "beach" "beach" "beach" "beach"
## [10] "beach"

Finally, we evaluate the model using the confusion matrix.

confusionMatrix(as.factor(pred_val), 
                as.factor(val_data$class)
                )

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction beach forest mountain
##   beach       78      0        7
##   forest       0     67        1
##   mountain     7     13       91
## 
## Overall Statistics
##                                                
##                Accuracy : 0.8939               
##                  95% CI : (0.8504, 0.9284)     
##     No Information Rate : 0.375                
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.8395               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: beach Class: forest Class: mountain
## Sensitivity                0.9176        0.8375          0.9192
## Specificity                0.9609        0.9946          0.8788
## Pos Pred Value             0.9176        0.9853          0.8198
## Neg Pred Value             0.9609        0.9337          0.9477
## Prevalence                 0.3220        0.3030          0.3750
## Detection Rate             0.2955        0.2538          0.3447
## Detection Prevalence       0.3220        0.2576          0.4205
## Balanced Accuracy          0.9393        0.9160          0.8990

The model can find the features and characteristics of the three image classes and performs the classification quite well

Tunning The Model

Let’s look back at our model architecture. If you have noticed, we can actually extract more information while the data is still in an 2D image array. The CNN layer has extract the general features of our image and then being downsampled using the max pooling layer. Even after three times pooling, we still have 16 x 16 array that has a lot of information to extract before flattening the data. Therefore, we can stacks more CNN layers into the model so there will be more information to be captured.

model

## Model
## Model: "CNN_Model"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  conv2d_2 (Conv2D)                  (None, 128, 128, 32)            896         
##                                                                                 
##  max_pooling2d_2 (MaxPooling2D)     (None, 64, 64, 32)              0           
##                                                                                 
##  conv2d_1 (Conv2D)                  (None, 64, 64, 64)              18496       
##                                                                                 
##  max_pooling2d_1 (MaxPooling2D)     (None, 32, 32, 64)              0           
##                                                                                 
##  conv2d (Conv2D)                    (None, 32, 32, 128)             73856       
##                                                                                 
##  max_pooling2d (MaxPooling2D)       (None, 16, 16, 128)             0           
##                                                                                 
##  flatten (Flatten)                  (None, 32768)                   0           
##                                                                                 
##  dense (Dense)                      (None, 128)                     4194432     
##                                                                                 
##  Output (Dense)                     (None, 3)                       387         
##                                                                                 
## ================================================================================
## Total params: 4,288,067
## Trainable params: 4,288,067
## Non-trainable params: 0
## ________________________________________________________________________________

The following is our improved model architecture:

1st Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
2rd Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
3th Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
4th Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
5th Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
Flattening layer from 2D array to 1D array
Dense layer to capture more information
Dense layer for output layer

# Set Initial Random Weight
tensorflow::tf$random$set_seed(123)

model_tunning <- keras_model_sequential(name = "CNN_Mode_tunningl") %>% 
  
  # Convolution Layer 1
  layer_conv_2d(filters = 32,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = c(target_size, 3) 
                ) %>% 

  # Max Pooling Layer 1
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>% 
  
  # Convolution Layer 2
  layer_conv_2d(filters = 64,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>% 

  # Max Pooling Layer 2
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>% 
  
  # Convolution Layer 3
  layer_conv_2d(filters = 128,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>%

  # Max Pooling Layer 3
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>%
  
  # Convolution Layer 4
  layer_conv_2d(filters = 256,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>%

  # Max Pooling Layer 4
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>%
  
   # Convolution Layer 5
  layer_conv_2d(filters = 256,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>%

  # Max Pooling Layer 5
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = c(2,2)) %>%
  
  # Flattening Layer
  layer_flatten() %>% 
  
  # Dense Layer1
  layer_dense(units = 128,
              activation = "relu") %>%
  
  # Output Layer
  layer_dense(units = output_n,
              activation = "softmax",
              name = "Output")

# Compile Model
model_tunning %>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_adam(learning_rate = 0.001),
    metrics = "accuracy"
  )
  
model_tunning

## Model
## Model: "CNN_Mode_tunningl"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  conv2d_7 (Conv2D)                  (None, 128, 128, 32)            896         
##                                                                                 
##  max_pooling2d_7 (MaxPooling2D)     (None, 64, 64, 32)              0           
##                                                                                 
##  conv2d_6 (Conv2D)                  (None, 64, 64, 64)              18496       
##                                                                                 
##  max_pooling2d_6 (MaxPooling2D)     (None, 32, 32, 64)              0           
##                                                                                 
##  conv2d_5 (Conv2D)                  (None, 32, 32, 128)             73856       
##                                                                                 
##  max_pooling2d_5 (MaxPooling2D)     (None, 16, 16, 128)             0           
##                                                                                 
##  conv2d_4 (Conv2D)                  (None, 16, 16, 256)             295168      
##                                                                                 
##  max_pooling2d_4 (MaxPooling2D)     (None, 8, 8, 256)               0           
##                                                                                 
##  conv2d_3 (Conv2D)                  (None, 8, 8, 256)               590080      
##                                                                                 
##  max_pooling2d_3 (MaxPooling2D)     (None, 4, 4, 256)               0           
##                                                                                 
##  flatten_1 (Flatten)                (None, 4096)                    0           
##                                                                                 
##  dense_1 (Dense)                    (None, 128)                     524416      
##                                                                                 
##  Output (Dense)                     (None, 3)                       387         
##                                                                                 
## ================================================================================
## Total params: 1,503,299
## Trainable params: 1,503,299
## Non-trainable params: 0
## ________________________________________________________________________________

We can once again fit the model into the data. We will let the data train with the same epochs and learning rate.

# Fit data into model
history <- model_tunning %>% 
  fit(
  # training data
  train_image_array_gen,

  # training epochs
  steps_per_epoch = as.integer(train_samples / batch_size), 
  epochs = 35, 
  
  # validation data
  validation_data = val_image_array_gen,
  validation_steps = as.integer(valid_samples / batch_size)
)

plot(history)

Now we will further evaluate the data and acquire the confusion matrix for the validation data.

pred_val_tun <- predict(model_tunning, val_x) %>% k_argmax() %>% as.array()
head(pred_val_tun, 10)

##  [1] 0 0 0 0 0 0 0 0 0 0

pred_val_tun <- sapply(pred_val_tun, decode) 

head(pred_val_tun, 10)

##  [1] "beach" "beach" "beach" "beach" "beach" "beach" "beach" "beach" "beach"
## [10] "beach"

From the confusion matrix it turns out that after tuning the model, the accuracy results obtained are not much different from the previous model.

confusionMatrix(as.factor(pred_val_tun), 
                as.factor(val_data$class)
                )

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction beach forest mountain
##   beach       81      0        9
##   forest       0     78        3
##   mountain     4      2       87
## 
## Overall Statistics
##                                                
##                Accuracy : 0.9318               
##                  95% CI : (0.8944, 0.9591)     
##     No Information Rate : 0.375                
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.8975               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: beach Class: forest Class: mountain
## Sensitivity                0.9529        0.9750          0.8788
## Specificity                0.9497        0.9837          0.9636
## Pos Pred Value             0.9000        0.9630          0.9355
## Neg Pred Value             0.9770        0.9891          0.9298
## Prevalence                 0.3220        0.3030          0.3750
## Detection Rate             0.3068        0.2955          0.3295
## Detection Prevalence       0.3409        0.3068          0.3523
## Balanced Accuracy          0.9513        0.9793          0.9212

Predict Data In Testing Dataset

After we train the model and are satisfied with the model’s performance on the validation dataset, we will evaluate another model using the test dataset. The test data is located in the test folder.

test <- read.csv("Capstone ML Data/7. where-were-you-cl/data/image-data-test.csv")

test_data <- data.frame(file_name = paste0("Capstone ML Data/7. where-were-you-cl/data/test/",
                                          test$id))

head(test_data, 10)

Then, we convert the image into 2D array.

test_x <- image_prep(test_data$file_name)

# Check dimension of testing data set
dim(test_x)

## [1] 294 128 128   3

The testing data consists of 294 images with dimension of 128 x 128 pixels and 3 color channels (RGB). After we have prepared the data test, we now can proceed to predict the label of each image using our CNN model.

pred_test <- predict(model_tunning, test_x) %>% k_argmax() %>% as.array()
head(pred_test, 10)

##  [1] 1 1 2 2 2 2 2 1 1 2

pred_test <- sapply(pred_test, decode) 

head(pred_test, 10)

##  [1] "forest"   "forest"   "mountain" "mountain" "mountain" "mountain"
##  [7] "mountain" "forest"   "forest"   "mountain"

test$label=pred_test
head(test)

Now we can export the dataframe to submission.csv

write.csv(test,"submission.csv",row.names = FALSE)

Conclusion

We have built a deep learning model that can find the features and characteristics of all three image classes (beach, forest, and mountain) and performs the classification quite well. The model used in this case is the CNN model which has been tuned by stacking more CNN layers to extract more information from the image and can perform the classification quite well. Business implementation that can be done is in the form of predictions of Covid-19 from chest x-ray images, face classification that using masks or not, etc.

Image Classification: Where Were You

Julio Fahcrel

1/11/2022