1 Introduction

Deep Learning has proved to be a very powerful tool because of its ability to handle large amounts of data. The interest to use hidden layers has surpassed traditional techniques, especially in pattern recognition. One of the most popular deep neural networks is Convolutional Neural Networks. A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. On this article, we will try to build a Convolutional Neural Network to classify images whether it is beach, forest, or mountain.

knitr::include_graphics("new-zealand.jpeg")

2 Library and Setup

# Data wrangling
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Image manipulation
library(imager)
## Warning: package 'imager' was built under R version 4.0.5
## Loading required package: magrittr
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
## 
##     add
## The following object is masked from 'package:stringr':
## 
##     boundary
## The following object is masked from 'package:tidyr':
## 
##     fill
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image
# Deep learning
library(tensorflow)
## Warning: package 'tensorflow' was built under R version 4.0.5
tf_version()
## [1] '2.4'
library(keras)

# Use conda environment
use_condaenv("r-tensorflow")

# Model Evaluation
library(caret)
## Warning: package 'caret' was built under R version 4.0.5
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:tensorflow':
## 
##     train
## The following object is masked from 'package:purrr':
## 
##     lift
options(scipen = 999)

3 Import Data

In image classification problem, it is a common practice to put each image on separate folders based on the target class/labels. For example, inside the train folder in our data, you can that we have 3 different folders, respectively for beach, forest, and mountain.

folder_list <- list.files("data/train/")
folder_path <- paste0("data/train/", folder_list, "/")
file_name <- map(folder_path,
                 function(x) paste0(x, list.files(x))
                 ) %>%
   unlist()
head(file_name)
## [1] "data/train/beach/beach_100.jpeg" "data/train/beach/beach_101.jpeg"
## [3] "data/train/beach/beach_102.jpeg" "data/train/beach/beach_103.jpeg"
## [5] "data/train/beach/beach_104.jpeg" "data/train/beach/beach_105.jpeg"

4 Exploratory Data Analysis

First randomly check on 6 images from the data

set.seed(28)
sample_image <- sample(file_name, 6)

# Load image into R
img <- map(sample_image, load.image)

# Plot image
par(mfrow = c(2, 3)) # Create 2 x 3 image grid
map(img, plot)

## [[1]]
## Image. Width: 364 pix Height: 138 pix Depth: 1 Colour channels: 3 
## 
## [[2]]
## Image. Width: 275 pix Height: 183 pix Depth: 1 Colour channels: 3 
## 
## [[3]]
## Image. Width: 275 pix Height: 183 pix Depth: 1 Colour channels: 3 
## 
## [[4]]
## Image. Width: 275 pix Height: 183 pix Depth: 1 Colour channels: 3 
## 
## [[5]]
## Image. Width: 367 pix Height: 137 pix Depth: 1 Colour channels: 3 
## 
## [[6]]
## Image. Width: 325 pix Height: 155 pix Depth: 1 Colour channels: 3

4.1 Check Image Dimension

To build a deep learning model, we need to understand the dimension of the input images. So that If the dimension of the input images are varying, we can create a proper target size for the model. Now let’s have a function to get all the informations about the image dimension.

get_dim <- function(x){
  img <- load.image(x) 
  
  df_img <- data.frame(height = height(img),
                       width = width(img),
                       filename = x
                       )
  
  return(df_img)
}
# Run the get_dim() function for each image
file_dim <- map_df(file_name, get_dim)

head(file_dim, 10)
##    height width                        filename
## 1     181   279 data/train/beach/beach_100.jpeg
## 2     135   372 data/train/beach/beach_101.jpeg
## 3     170   296 data/train/beach/beach_102.jpeg
## 4     183   276 data/train/beach/beach_103.jpeg
## 5     190   266 data/train/beach/beach_104.jpeg
## 6     168   300 data/train/beach/beach_105.jpeg
## 7     159   318 data/train/beach/beach_106.jpeg
## 8     184   274 data/train/beach/beach_107.jpeg
## 9     225   225 data/train/beach/beach_108.jpeg
## 10    126   401 data/train/beach/beach_109.jpeg
summary(file_dim)
##      height          width         filename        
##  Min.   : 94.0   Min.   :100.0   Length:1453       
##  1st Qu.:168.0   1st Qu.:268.0   Class :character  
##  Median :183.0   Median :275.0   Mode  :character  
##  Mean   :177.8   Mean   :282.4                     
##  3rd Qu.:184.0   3rd Qu.:300.0                     
##  Max.   :314.0   Max.   :534.0

The image data has a great variation in the dimension. Some images has less than 100 pixels in height and width while others has up to 300 pixels, even 500 pixels. Understanding the dimension of the image will help us on the next part of the process which is data pre-processing.

5 Data Pre-processing

Based on the previous summary of the image dimensions, we can determine the input dimension for the deep learning model. All input images should have the same dimensions. Bigger dimensions will have more features but will also take longer time to train. However, if the image size is too small, we will lose a lot of information from the data. So balancing this trade-off is the art of data preprocessing in image classification. For now, we will try 128 x 128 pixels. We also set the batch size for the data so the model will be updated every time it finished training on a single batch. Here, we set the batch size to 32.

# Desired height and width of images
target_size <- c(128, 128)

# Batch size for training the model
batch_size <- 32

Since we have a little amount of training set, we will build artificial data using method called Image Augmentation. Image augmentation is one useful technique from keras that can manipulate the image training set without acquiring new images. The goal is that to teach the model not only with the original image but also the modification of the image, such as flipping the image, rotate it, zooming, crop the image, etc. This will create more robust model. To do image augmentation, we can fit the data into a generator. Here, we will create the image generator for keras with the following properties:

  • Scaling the pixel value by dividing the pixel value by 255
  • Flip the image horizontally
  • Flip the image vertically
  • Rotate the image from 0 to 45 degrees
  • Shear range to rectify the perception angles by 20%
  • Shift the image to left or right by 20%
  • Shift the image to up or down by 20%
  • Use 20% of the data train as validation dataset Validation data is used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters.
train_data_gen <- image_data_generator(rescale = 1/255, # Rescaling pixel value
                                       horizontal_flip = T, # Flip image horizontally
                                       rotation_range = 20, # Rotate image from 0 to 180 degrees
                                       validation_split = 0.2, # 20% data as validation data
                                       shear_range = 0.2, #Shear the intensity by 0.2
                                       brightness_range = c(0.7, 1.3),
                                       width_shift_range=0.2,
                                       height_shift_range=0.2,
                                       fill_mode = "nearest"
                                       )

Now it’s time to insert the image data into the generator using the flow_images_from_directory(). The data is located inside the data folder and inside the train folder, so the directory will be data/train. From this process, we will get the augmented image both for training data and the validation data.

# Training Dataset
train_image_array_gen <- flow_images_from_directory(directory = "data/train/", # Folder of the data
                                                    target_size = target_size, # target of the image dimension (64 x 64)  
                                                    color_mode = "rgb", # use RGB color
                                                    batch_size = batch_size , 
                                                    seed = 28,  # set random seed
                                                    subset = "training", # declare that this is for training data
                                                    class_mode = "categorical",
                                                    generator = train_data_gen
                                                    )

# Validation Dataset
val_image_array_gen <- flow_images_from_directory(directory = "data/train/",
                                                  target_size = target_size, 
                                                  color_mode = "rgb", 
                                                  batch_size = batch_size ,
                                                  seed = 28,
                                                  subset = "validation", # declare that this is the validation data
                                                  class_mode = "categorical",
                                                  generator = train_data_gen
                                                  )

And then collect some information from the generator and check the class proportion of the train dataset. The index correspond to each labels of the target variable and ordered alphabetically (beach, forest, and mountain).

train_samples <- train_image_array_gen$n

# Number of validation samples
valid_samples <- val_image_array_gen$n

# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)

table("\nFrequency" = factor(train_image_array_gen$classes)
      ) %>% 
  prop.table()
## 
## Frequency
##         0         1         2 
## 0.3324742 0.3256014 0.3419244

For now it is well balanced. But I had to say that at first the original proportion of the train dataset is around 0.32, 0.30, and 0.37 and I have to balance the data by copying themselves manually (oversampling). Unbalanced classes create a problem like bias, inaccurate and unsatisfactory classifiers.

6 Convolutional Neural Network

6.1 Model Architecture

After Data Pre-processing, we will start building the model architecture for the deep learning. We will build a first model with the following layer:

  • 1st Convolutional layer to extract features from 2D image with relu activation function
  • Max Pooling layer to downsample the image features
  • 2nd Convolutional layer to extract features from 2D image with relu activation function
  • Max Pooling layer to downsample the image features
  • 3rd Convolutional layer to extract features from 2D image with relu activation function
  • Max Pooling layer to downsample the image features
  • 4th Convolutional layer to extract features from 2D image with relu activation function
  • Max Pooling layer to downsample the image features
  • Flattening layer to flatten data from 2D array to 1D array
  • 1st Dense layer to capture more information wit relu activation function
  • 2nd Dense layer to capture more information wit relu activation function
  • Dropout layer to remove the unwanted noise data that prevent overfitting
  • Dense layer for output with softmax activation function

Don’t forget to set the input size in the first layer. If the input image is in RGB, set the final number to 3, which is the number of color channels. If the input image is in grayscale, set the final number to 1.

input_shape <- c(target_size, 3)
tf$random$set_seed(28)
model <- keras_model_sequential(name = "where_am_I") %>% 
  
  # Convolution Layer
  
  layer_conv_2d(filters = 16,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = input_shape) %>% 
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  layer_conv_2d(filters = 32,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  layer_conv_2d(filters = 64,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = input_shape) %>% 
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  layer_conv_2d(filters = 128,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 

  # Max Pooling Layer
  
  # Flattening Layer
  layer_flatten() %>% 
  # Dense Layer
  layer_dense(units = 128,
              activation = "relu") %>% 
  layer_dense(units = 64,
              activation = "relu") %>%
  layer_dropout(rate = 0.5) %>%
  # Output Layer
  layer_dense(units = output_n,
              activation = "softmax",
              name = "Output")
  

model
## Model
## Model: "where_am_I"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d_3 (Conv2D)                   (None, 128, 128, 16)            448         
## ________________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D)      (None, 64, 64, 16)              0           
## ________________________________________________________________________________
## conv2d_2 (Conv2D)                   (None, 64, 64, 32)              4640        
## ________________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D)      (None, 32, 32, 32)              0           
## ________________________________________________________________________________
## conv2d_1 (Conv2D)                   (None, 32, 32, 64)              18496       
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)      (None, 16, 16, 64)              0           
## ________________________________________________________________________________
## conv2d (Conv2D)                     (None, 16, 16, 128)             73856       
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D)        (None, 8, 8, 128)               0           
## ________________________________________________________________________________
## flatten (Flatten)                   (None, 8192)                    0           
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 128)                     1048704     
## ________________________________________________________________________________
## dense (Dense)                       (None, 64)                      8256        
## ________________________________________________________________________________
## dropout (Dropout)                   (None, 64)                      0           
## ________________________________________________________________________________
## Output (Dense)                      (None, 3)                       195         
## ================================================================================
## Total params: 1,154,595
## Trainable params: 1,154,595
## Non-trainable params: 0
## ________________________________________________________________________________

Start by entering image data with 128 x 128 pixels into the convolutional layer, which has 16 filters to extract featuers from the image. The padding = same argument is used to keep the dimension of the feature to be 128 x 128 pixels after being extracted. We then downsample or only take the maximum value for each 2 x 2 pooling area so the data now only has 64 x 64 pixels with from 16 filters and so on, till we get 8 x 8 pixels from 128 filters. After that, from 8 x 8 pixels we flatten the 2D array into a 1D array with 8 x 8 x 128 = 8192 nodes. We can further extract information using the simple dense layer and finished by flowing the information into the output layer, which will be transformed using the softmax activation function to get the probability of each class as the output.

6.2 Model Fitting

Now with the data and the model are ready, It is time to fit the data to the model. For starter, we will use 25 epochs to train the data. For multi-classes classification, we will use categorical cross-entropy as the loss function. We use adam optimizer with the default learning rate (0.001). We will also evaluate the model with the validation data from the generator.

model%>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_adam(),
    metrics = "accuracy"
  )

# Fit data into model
history <- model %>% 
  fit(
  # training data
  train_image_array_gen,

  # training epochs
  steps_per_epoch = as.integer(train_samples/batch_size),
  epochs = 25, 
  
  # validation data
  validation_data = val_image_array_gen,
  validation_steps = as.integer(valid_samples/batch_size),
  verbose = 2
)

plot(history)
## `geom_smooth()` using formula 'y ~ x'

6.3 Model Evaluation

Now we will further evaluate and acquire the confusion matrix using the validation data from the generator. First, we need to acquire the file name of the image that is used as the data validation. From the file name, we will extract the categorical label as the actual value of the target variable.

val_data <- data.frame(file_name = paste0("data/train/", val_image_array_gen$filenames)) %>% 
  mutate(class = str_extract(file_name, "beach|forest|mountain"))

head(val_data, 10)
##                           file_name class
## 1  data/train/beach\\beach_100.jpeg beach
## 2  data/train/beach\\beach_101.jpeg beach
## 3  data/train/beach\\beach_102.jpeg beach
## 4  data/train/beach\\beach_103.jpeg beach
## 5  data/train/beach\\beach_104.jpeg beach
## 6  data/train/beach\\beach_105.jpeg beach
## 7  data/train/beach\\beach_106.jpeg beach
## 8  data/train/beach\\beach_107.jpeg beach
## 9  data/train/beach\\beach_108.jpeg beach
## 10 data/train/beach\\beach_109.jpeg beach

We need to get the image into R by converting the image into an array. Since our input dimension for CNN model is image with 64 x 64 pixels with 3 color channels (RGB), we will do the same with the image of the testing data. The reason of using array is that we want to predict the original image fresh from the folder so we will not use the image generator since it will transform the image and does not reflect the actual image.

image_prep <- function(x) {
  arrays <- lapply(x, function(path) {
    img <- image_load(path, target_size = target_size, 
                      grayscale = F # Set FALSE if image is RGB
                      )
    
    x <- image_to_array(img)
    x <- array_reshape(x, c(1, dim(x)))
    x <- x/255 # rescale image pixel
  })
  do.call(abind::abind, c(arrays, list(along = 1)))
}
val_x <- image_prep(val_data$file_name)

# Check dimension of testing data set
dim(val_x)
## [1] 289 128 128   3

The validation data consists of 264 images with dimensions of 128 x 128 pixels and 3 color channels (RGB). After we have prepared the data test, we now can proceed to predict the label of each image using our CNN model.

pred_val1 <- predict_classes(model, val_x)

head(pred_val1, 10)
##  [1] 0 0 0 0 0 0 0 0 0 0

For easier presentation, we will convert the encoding into the true class label.

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "beach",
            x == 1 ~ "forest",
            x == 2 ~ "mountain"
            )
}
pred_val1 <- sapply(pred_val1, decode) 
head(pred_val1)
## [1] "beach" "beach" "beach" "beach" "beach" "beach"

Finally evaluate the model using confusion matrix below. And as we can see the accuracy is pretty high for the first model. The precision, sensitivity, or specificity are also high enough above 80% for each class. Hopefully the model can be tuned by improving the model architecture.

confusionMatrix(as.factor(pred_val1), 
                as.factor(val_data$class)
                )
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction beach forest mountain
##   beach       86      0        7
##   forest       0     91        6
##   mountain    10      3       86
## 
## Overall Statistics
##                                                
##                Accuracy : 0.91                 
##                  95% CI : (0.871, 0.9404)      
##     No Information Rate : 0.3426               
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.865                
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: beach Class: forest Class: mountain
## Sensitivity                0.8958        0.9681          0.8687
## Specificity                0.9637        0.9692          0.9316
## Pos Pred Value             0.9247        0.9381          0.8687
## Neg Pred Value             0.9490        0.9844          0.9316
## Prevalence                 0.3322        0.3253          0.3426
## Detection Rate             0.2976        0.3149          0.2976
## Detection Prevalence       0.3218        0.3356          0.3426
## Balanced Accuracy          0.9298        0.9687          0.9001

7 Model Tuning

7.1 Model Architecture

We will try to add more convolutional layer and max pooling layer to extract more features from the images. So here is our new model architecture:

  • 1st Convolutional layer to extract features from 2D image with relu activation function

  • Max pooling layer

  • 2nd Convolutional layer to extract features from 2D image with relu activation function

  • Max pooling layer

  • 3rd Convolutional layer to extract features from 2D image with relu activation function

  • Max pooling layer

  • 4th Convolutional layer to extract features from 2D image with relu activation function

  • Max pooling layer

  • 5th Convolutional layer to extract features from 2D image with relu activation function

  • Max pooling layer

  • Flattening layer from 2D array to 1D array

  • Dense layer to capture more information

  • Dense layer for output layer

    Actually we can be more creative when designing our own model architecture. But as we already get an accuracy of 87.8%, we can design casually by adding just some layers.

tf$random$set_seed(28)
model_tune2 <- keras_model_sequential(name = "where_am_I") %>% 
  
  # Convolution Layer and Max Pooling Layer
  
  layer_conv_2d(filters = 16,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = input_shape) %>% 
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  layer_conv_2d(filters = 32,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  layer_conv_2d(filters = 64,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  layer_conv_2d(filters = 128,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  layer_conv_2d(filters = 256,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%

  # Flattening Layer
  layer_flatten() %>% 
  # Dense Layer
  layer_dense(units = 128,
              activation = "relu") %>% 
  layer_dense(units = 64,
              activation = "relu") %>%
  layer_dropout(rate = 0.5) %>%

  # Output Layer
  layer_dense(units = output_n,
              activation = "softmax",
              name = "Output")
  

model_tune2
## Model
## Model: "where_am_I"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d_8 (Conv2D)                   (None, 128, 128, 16)            448         
## ________________________________________________________________________________
## max_pooling2d_8 (MaxPooling2D)      (None, 64, 64, 16)              0           
## ________________________________________________________________________________
## conv2d_7 (Conv2D)                   (None, 64, 64, 32)              4640        
## ________________________________________________________________________________
## max_pooling2d_7 (MaxPooling2D)      (None, 32, 32, 32)              0           
## ________________________________________________________________________________
## conv2d_6 (Conv2D)                   (None, 32, 32, 64)              18496       
## ________________________________________________________________________________
## max_pooling2d_6 (MaxPooling2D)      (None, 16, 16, 64)              0           
## ________________________________________________________________________________
## conv2d_5 (Conv2D)                   (None, 16, 16, 128)             73856       
## ________________________________________________________________________________
## max_pooling2d_5 (MaxPooling2D)      (None, 8, 8, 128)               0           
## ________________________________________________________________________________
## conv2d_4 (Conv2D)                   (None, 8, 8, 256)               295168      
## ________________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D)      (None, 4, 4, 256)               0           
## ________________________________________________________________________________
## flatten_1 (Flatten)                 (None, 4096)                    0           
## ________________________________________________________________________________
## dense_3 (Dense)                     (None, 128)                     524416      
## ________________________________________________________________________________
## dense_2 (Dense)                     (None, 64)                      8256        
## ________________________________________________________________________________
## dropout_1 (Dropout)                 (None, 64)                      0           
## ________________________________________________________________________________
## Output (Dense)                      (None, 3)                       195         
## ================================================================================
## Total params: 925,475
## Trainable params: 925,475
## Non-trainable params: 0
## ________________________________________________________________________________

7.2 Model Fitting

We use the same adam optimizer and categorical_crossentropy. And we still use the same epoch of 25 and run.

model_tune2%>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_adam(),
    metrics = "accuracy"
  )

# Fit data into model
history_tune2 <- model_tune2 %>% 
  fit(
  # training data
  train_image_array_gen,

  # training epochs
  steps_per_epoch = as.integer(train_samples/batch_size),
  epochs = 25, 
  
  # validation data
  validation_data = val_image_array_gen,
  validation_steps = as.integer(valid_samples/batch_size),
  verbose = 2
)

plot(history_tune2)
## `geom_smooth()` using formula 'y ~ x'

## Model Evaluation

pred_val <- predict_classes(model_tune2, val_x)

head(pred_val, 10)
##  [1] 0 0 0 0 0 0 0 0 0 0
# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "beach",
            x == 1 ~ "forest",
            x == 2 ~ "mountain"
            )
}
pred_val <- sapply(pred_val, decode) 
head(pred_val)
## [1] "beach" "beach" "beach" "beach" "beach" "beach"
confusionMatrix(as.factor(pred_val), 
                as.factor(val_data$class)
                )
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction beach forest mountain
##   beach       84      0        5
##   forest       0     89        6
##   mountain    12      5       88
## 
## Overall Statistics
##                                                
##                Accuracy : 0.9031               
##                  95% CI : (0.863, 0.9347)      
##     No Information Rate : 0.3426               
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.8546               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: beach Class: forest Class: mountain
## Sensitivity                0.8750        0.9468          0.8889
## Specificity                0.9741        0.9692          0.9105
## Pos Pred Value             0.9438        0.9368          0.8381
## Neg Pred Value             0.9400        0.9742          0.9402
## Prevalence                 0.3322        0.3253          0.3426
## Detection Rate             0.2907        0.3080          0.3045
## Detection Prevalence       0.3080        0.3287          0.3633
## Balanced Accuracy          0.9245        0.9580          0.8997

The accuracy is for beach is increasing but decreasing for the other classes.

8 Conclusion

Our goal is to classify images whether it is beach, forest, or mountain. In this case, model performance have achieved the goal with pretty high accuracy and other metric above 80%. It means that this problem to classify images can be solved by Convolutional Neural Network. The Convolutional Neural Network itself is commonly used for computer vision/image recognition. Nowadays researchers are using CNN for OCR, object detection for self-driving cars, face recognition for social media, financial quantitative investment, medical imaging and many more.