# clear-up the environment
rm(list = ls())

# chunk options
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  fig.align = "center",
  comment = "#>"
)

1 Intro

This article is made for completing the assignment for Algoritma : Machine Learning Course in Neural Network and Deep Learning Theia Batch 2022.

In this article I will try to make image classification model using MNIST : American Sign Language Dataset.

The original MNIST image dataset of handwritten digits is a popular benchmark for image-based machine learning methods but researchers have renewed efforts to update it and develop drop-in replacements that are more challenging for computer vision and original for real-world applications. As noted in one recent replacement called the Fashion-MNIST dataset, the Zalando researchers quoted the startling claim that “Most pairs of MNIST digits (784 total pixels per sample) can be distinguished pretty well by just one pixel”. To stimulate the community to develop more drop-in replacements, the Sign Language MNIST is presented here and follows the same CSV format with labels and pixel values in single rows.

The American Sign Language letter database of hand gestures represent a multi-class problem with 24 classes of letters (excluding J and Z which require motion).

This is the Complete American Sign Language
American Sign Language Complete

American Sign Language Complete

This is the alphabet represented in the Dataset.
American Sign Language in This Dataset

American Sign Language in This Dataset

So instead of 26 Class of Alphabet, we will use 24 Class of Alphabet excluding J and Z.

2 Explaratory Data Analysis

#Load Library
library(tufte)
library(ggplot2)
library(tidyverse)
library(dplyr)
library(caret)
library(keras)
library(tensorflow)

2.1 Import Dataset

The dataset is provided in a form of .csv Document. It contains one column of labels of which alphabet is represented on the image, and then the rest of columns are the pixel value of each image on the dataset.

There are two .csv documents provided, one is for training purposes and one is for validation purposes. Each document also contains label for re-validation purpose.

#Import Dataset
train_hand <- read.csv("data-input/sign_mnist_train.csv")
test_hand <- read.csv("data-input/sign_mnist_test.csv")

#Check Dimension of Dataset
dim(train_hand)
#> [1] 27455   785
dim(test_hand)
#> [1] 7172  785

This dataset contains 27455 data of image for training, and 7172 data of image for validation (testing).

2.2 Check Dataset and Data Wrangling

Check the structure of Data using head() function.

head(train_hand)
range(train_hand$pixel1:train_hand$pixel784)
#> [1] 107 202

Based on the glimpse above, we see that the data frame contains 785 columns, of which one column is label and 784 columns contain pixel value. Pixel value is grayscale because we only have one channel (one-dataframe for one-color) in this example.

We wanted to check how many class on the dataset :

sort(unique(train_hand$label))
#>  [1]  0  1  2  3  4  5  6  7  8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
length(sort(unique(train_hand$label)))
#> [1] 24

We see that there are values from 0 to 24, but we only have 24 classes. After looking at the value we see that value 9 is missing.

We will take a further look to this value :

#Create Function for Pixels Visualization
vizTrain <- function(input){
  
  dimmax <- sqrt(ncol(input[,-1]))
  
  dimn <- ceiling(sqrt(nrow(input)))
  par(mfrow=c(dimn, dimn), mar=c(.1, .1, .1, .1))
  
  for (i in 1:nrow(input)){
      m1 <- as.matrix(input[i,2:785])
      dim(m1) <- c(28,28)
      
      m1 <- apply(apply(m1, 1, rev), 1, t)
      
      image(1:28, 1:28, 
            m1, col=grey.colors(255), 
            # remove axis text
            xaxt = 'n', yaxt = 'n')
      text(2, 20, col="white", cex=1.2, input[i, 1])
  }
  
}

First, we take a look at label 0 :

# Check Label 0
vizTrain(head(train_hand[train_hand$label == 0,], 9))

We see that the class 0 is representing A in the alphabet, based on the ASL picture above in this article.

Next, we take a look at label 8 and label 10 :

# Check Label 8
vizTrain(head(train_hand[train_hand$label == 8,], 9))

# Check Label 10
vizTrain(head(train_hand[train_hand$label == 10,], 9))

We see that the class is actually sorted from 0 - 24 is represented for A - Y. Because of that we change the label accordingly to eliminate the missing value of 9 on the label both for Training and Testing Dataset.

(if anyone has an opinion for better codes, I’m open for correction! Please give me some insight)

#Replace 0 with 1 ~ 8 with 9, Training Dataset.
train_hand <- train_hand %>% mutate(label = ifelse(label == 0, 1, 
                                                 ifelse(label == 1, 2, 
                                                        ifelse(label == 2, 3, 
                                                                    ifelse(label == 3, 4, 
                                                                           ifelse(label == 4, 5, 
                                                                                  ifelse(label == 5, 6, 
                                                                                         ifelse(label == 6, 7, 
                                                                                                ifelse(label == 7, 8, 
                                                                                                       ifelse(label == 8, 9, label)
                                                                                                       )
                                                                                         )
                                                                                  )
                                                                           )
                                                                    )
                                                        )
                                                 )
                                                 )
                                  ) %>% 
  mutate(label = as.numeric(label) - 1)
#Replace 0 with 1 ~ 8 with 9, Testing Dataset
test_hand <- test_hand %>% mutate(label = ifelse(label == 0, 1, 
                                                 ifelse(label == 1, 2, 
                                                        ifelse(label == 2, 3, 
                                                                    ifelse(label == 3, 4, 
                                                                           ifelse(label == 4, 5, 
                                                                                  ifelse(label == 5, 6, 
                                                                                         ifelse(label == 6, 7, 
                                                                                                ifelse(label == 7, 8, 
                                                                                                       ifelse(label == 8, 9, label)
                                                                                                       )
                                                                                         )
                                                                                  )
                                                                           )
                                                                    )
                                                        )
                                                 )
                                                 )
                                  ) %>% 
  mutate(label = as.numeric(label) - 1)

We check the proportion of each class on the training dataset :

# Check Proportion of Each Class
prop.table(table(train_hand$label))
#> 
#>          0          1          2          3          4          5          6 
#> 0.04101257 0.03678747 0.04166818 0.04356219 0.03485704 0.04385358 0.03970133 
#>          7          8          9         10         11         12         13 
#> 0.03689674 0.04232380 0.04057549 0.04520124 0.03842652 0.04192315 0.04356219 
#>         14         15         16         17         18         19         20 
#> 0.03962848 0.04658532 0.04713167 0.04367146 0.04319796 0.04228738 0.03940994 
#>         21         22         23 
#> 0.04461847 0.04239665 0.04072118

Based on values above, we think that the proportion is reasonably equal. So we decided to further use this dataset without changing the proportion, although we will still use data augmentation in CNN Model.

3 Modelling Neural Network

A neural network is a type of machine learning model that is inspired by the structure and function of the brain. It is made up of layers of interconnected nodes, which process and transmit information. Each node receives input from other nodes, performs a computation on that input, and produces an output that is transmitted to other nodes. The weights of the connections between nodes can be adjusted based on input and output data, allowing the neural network to “learn” from the data. Neural networks are commonly used for tasks such as image and speech recognition, language translation, and predictive modeling.

Model Neural Network

Model Neural Network

  • Input Layer : is the first layer of a neural network and is responsible for accepting the input data and forwarding it to the other layers of the network for processing.

  • Hidden Layer : are responsible for performing computations on the input data and transmitting the results to the output layer. They are usually composed of a large number of interconnected nodes, which perform a non-linear computation on the input data and produce an output that is transmitted to the next layer.

  • Output Layer : is the last layer of a neural network and is responsible for producing the final output of the network. The output of the network may be a prediction, a classification, or some other type of computation, depending on the task that the network is designed to perform.

3.1 Data Preparation for Neural Network

We separate the data frame from label and pixel values. Because the pixel values range from 1 - 255, we divide it by 255 so we have a value from 0 ~ 1. This will reduce computation time so it will be lighter for our modeling process.

#Separate Training Pixel Value
train_x <- train_hand %>% 
  select(-label) %>%  
  as.matrix()/255 #divide by 255 so the value 0 ~ 1

#Separate Train Label
train_y <- train_hand$label 

#Separate Test Pixel Value
test_x <- test_hand %>% 
  select(-label) %>% 
  as.matrix()/255 #divide by 255 so the value 0 ~ 1

#Separate Test Label
test_y <- test_hand$label

We process the label for one-hot encoding. It is a process used to convert categorical variables, which have a finite set of possible values, into a numerical representation that can be used as input to machine learning algorithms.

In one-hot encoding, each categorical value is represented as a binary vector, with a 1 in the position corresponding to the categorical value, and 0s in all other positions. For example A, B, and C, the one-hot encoding would be:

A: [1, 0, 0, etc] B: [0, 1, 0, etc] C: [0, 0, 1, etc]

One-hot encoding is often used when working with categorical data in machine learning, because many algorithms cannot handle categorical variables directly. By encoding the categorical variables into a numerical representation, we can use these algorithms with our data.

#one hot encoding
train_y <- to_categorical(train_y , num_classes = 24)
test_y <- to_categorical(test_y , num_classes = 24)

We take input_dim from the number of columns in train_x / basically how many pixels are there. We take num_class from counting how many values in label in the dataset.

input_dim <- ncol(train_x)
num_class <- n_distinct(train_hand$label)

3.2 Initial Model

We create initial model with name model_0. We will further analyze the result to make better models

#set random weight
set_random_seed(100)

# Create Model Architecture
model_0 <- keras_model_sequential(name = "model_0") %>% 
  
  # input layer + hidden layer 1
  layer_dense(units = 64, #nodes number on hidden layer
              activation = "relu", #activation function on hidden layer, because we process data we use "relu"
              name = "hidden_1", 
              input_shape = input_dim #nodes number on input layer is equal to input dimensions
              ) %>% 
  
  # hidden layer 2
  layer_dense(units = 16, 
              activation = "relu", 
              name = "hidden_2"
              ) %>% 
  
  # output layer
  layer_dense(units = num_class, 
              activation = "softmax", #activation function on output layer, we use "softmax" because the output is multiple classes
              name = "output")

model_0
#> Model: "model_0"
#> ________________________________________________________________________________
#> Layer (type)                        Output Shape                    Param #     
#> ================================================================================
#> hidden_1 (Dense)                    (None, 64)                      50240       
#> ________________________________________________________________________________
#> hidden_2 (Dense)                    (None, 16)                      1040        
#> ________________________________________________________________________________
#> output (Dense)                      (None, 24)                      408         
#> ================================================================================
#> Total params: 51,688
#> Trainable params: 51,688
#> Non-trainable params: 0
#> ________________________________________________________________________________

We set Loss Function and Optimizer for updating weight for every epoch the machine will learn.

model_0 %>% 
  compile(loss = "categorical_crossentropy", #because multiple class, use "categorical_crossentropy"
          optimizer = optimizer_adam(learning_rate = 0.01), #adam optimizer works better in this case (personal trial)
          metrics = "accuracy")

Model Fitting to further see the performance of model from the first epoch into the last.

In this section, epoch is basically a number of iteration on the dataset. So it will split the learning into 20 iterations. Batch Size is the number of images processed in one epoch.

We use epoch = 20 because we didn’t see any further change from 20 more, batch size = 128 is acquired based on personal tweaking on this dataset, bigger batch size is not giving us any additional information on the model.

history0 <- model_0 %>% fit(x = train_x, #Predictor using dataset from `train_x`
               y = train_y, #label
               epoch = 20, #iteration number
               batch_size = 128, #number of data (images in this case) on each epoch
               validation_data = list( test_x , test_y ), #cross validation data
               verbose = 1) #for tracing the result

hist0 <- plot(history0)
ggsave("data-output/plot/hist0.png", hist0)
history0

history0

Insight :

  • The model is still perform very bad, like it didn’t learn anything.

3.3 1st Iteration

We change the number of nodes on the first and second hidden layer. We also put smaller learning rate (0.001) so it can grasp the learning curve better.

set_random_seed(100)

model_1 <- keras_model_sequential(name = "model_1") %>% 
  
  # input layer + hidden layer 1
  layer_dense(units = 128, 
              activation = "relu", 
              name = "hidden_1", 
              input_shape = input_dim 
              ) %>% 
  
  # hidden layer 2
  layer_dense(units = 64, 
              activation = "relu", 
              name = "hidden_2"
              ) %>% 

  
  # output layer
  layer_dense(units = num_class, 
              activation = "softmax", 
              name = "output")

model_1
#> Model: "model_1"
#> ________________________________________________________________________________
#> Layer (type)                        Output Shape                    Param #     
#> ================================================================================
#> hidden_1 (Dense)                    (None, 128)                     100480      
#> ________________________________________________________________________________
#> hidden_2 (Dense)                    (None, 64)                      8256        
#> ________________________________________________________________________________
#> output (Dense)                      (None, 24)                      1560        
#> ================================================================================
#> Total params: 110,296
#> Trainable params: 110,296
#> Non-trainable params: 0
#> ________________________________________________________________________________
model_1 %>% 
  compile(loss = "categorical_crossentropy",
          optimizer = optimizer_adam(learning_rate = 0.001),
          metrics = "accuracy")
# train model
history1 <- model_1 %>% fit(x = train_x, 
               y = train_y, 
               epoch = 20, 
               batch_size = 128, 
               validation_data = list( test_x , test_y ), 
               verbose = 1) 

hist1 <- plot(history1)
ggsave("data-output/plot/hist1.png", hist1)
history1

history1

Insight :

  • The model is still not satisfying enough. The Result for Accuracy is quite good for 70% predicted right but it is still not good enough.
  • We see no further change in performance after Epoch : 12

3.4 2nd Iteration

We add another hidden layer and change the nodes on each hidden layer. We also put smaller learning rate to this iteration :

# Set seed bobot awal
set_random_seed(100)

# Membuat arsitektur
model_2 <- keras_model_sequential(name = "model_2") %>% 
  
  # input layer + hidden layer 1
  layer_dense(units = 512, 
              activation = "relu", 
              name = "hidden_1", 
              input_shape = input_dim 
              ) %>% 
  
  # hidden layer 2
  layer_dense(units = 128, 
              activation = "relu", 
              name = "hidden_2"
              ) %>% 
  
  # hidden layer 3
  layer_dense(units = 32, 
              activation = "relu", 
              name = "hidden_3"
              ) %>% 
  
  # output layer
  layer_dense(units = num_class, 
              activation = "softmax", 
              name = "output")

model_2
#> Model: "model_2"
#> ________________________________________________________________________________
#> Layer (type)                        Output Shape                    Param #     
#> ================================================================================
#> hidden_1 (Dense)                    (None, 512)                     401920      
#> ________________________________________________________________________________
#> hidden_2 (Dense)                    (None, 128)                     65664       
#> ________________________________________________________________________________
#> hidden_3 (Dense)                    (None, 32)                      4128        
#> ________________________________________________________________________________
#> output (Dense)                      (None, 24)                      792         
#> ================================================================================
#> Total params: 472,504
#> Trainable params: 472,504
#> Non-trainable params: 0
#> ________________________________________________________________________________
model_2 %>% 
  compile(loss = "categorical_crossentropy",
          optimizer = optimizer_adam(learning_rate = 0.0005),
          metrics = "accuracy")
history2 <- model_2 %>% fit(x = train_x, 
               y = train_y, 
               epoch = 20, 
               batch_size = 128, 
               validation_data = list( test_x , test_y ), 
               verbose = 1) 

hist2 <- plot(history2)
ggsave("data-output/plot/hist2.png", hist2)
history2

history2

Insight :

  • The model has not improved significantly, so we think it is already the optimum result using this method.
  • After Epoch 12, the result stabilizes and it is around the accuracy metric of 0.7 - 0.75.
  • Further tweaking the parameter could result in over-fitting, we see that the performance differences between training dataset and validation dataset is quite high.

4 Modelling Convolutional Neural Network

A convolutional neural network (CNN) is a type of neural network specifically designed for image and video analysis. It is particularly effective at identifying patterns and features in images, making it a powerful tool for image classification, object detection, and other image processing tasks.

CNNs are characterized by their use of convolutional layers, which apply a set of filters to the input data to extract features and create a representation of the input that is easier to process. These filters are called kernels or weights, and they are adjusted during the training process to optimize the network’s performance.

The convolutional layers are followed by one or more fully connected layers, which perform a traditional neural network computation on the output of the convolutional layers. This combination of convolutional and fully connected layers allows CNNs to learn and recognize patterns and features in images and other data, while also being able to classify or predict based on those patterns.

CNNs are widely used in a variety of applications, including image and video classification, object detection, image generation, and many others.

Convolutional Neural Network

Convolutional Neural Network

Basically, compared to the common Neural Network. Convolutional Neural Network (CNN) is processing the image on a matrix array, so it will scan the image using a square filter on a square image.

It will respect the configuration of each pixel on the-x and the-y axis so there’s information gained between those pixels. After several filter applied the model will acquire several features on the image and the dimension will be much smaller and filter can be bigger, then it will be flatten so it will become a 1D array and then it will be processed using a common neural network.

CNN Matrix

CNN Matrix

We use several Layer in this Modelling

  • Convolutional Layer : the use of convolutional filters, which apply a set of weights to small regions of the input data to extract features and create a representation of the input that is easier to process. The Output is a feature map, that catch certain features on the image.

  • Normalization Layer : used to improve the training and generalization of a neural network by reducing the internal covariate shift, which is the change in the distribution of the inputs to a layer caused by the adjustment of the weights of the previous layer

  • Max Pooling Layer : a downsampling technique that reduces the dimensionality of the input data by taking the maximum value of a group of adjacent elements.

  • Drop Out Layer : a regularization technique for reducing overfitting in neural networks. It is implemented as a layer in the network that “drops out” a random subset of the activations of the previous layer during training, by setting them to zero.

  • Flattening Layer : to flatten the output of a convolutional or pooling layer into a single vector

After flattening, the rest is the same with traditional form of Neural Network.

4.1 Data Preparation for Convolutional Neural Network

Separate the column pixels and the label, just like the 1st Method.

train_x_2 <- train_hand %>% 
  select(-label) 

train_y_2 <- train_hand$label

test_x_2 <- test_hand %>% 
  select(-label) 

test_y_2 <- test_hand$label
dim(train_x_2)
#> [1] 27455   784
dim(test_x_2)
#> [1] 7172  784

We have one dimension of values (pixel 1 ~ pixel 784), we need to change it accordingly into height and width. How to do it? transform the matrix array

sqrt(784)
#> [1] 28

Based on the number of pixels (784), the height and width is acquired by **square rooting* the values, the result is 28.

Because the initial form is row as number of samples, and then pixels. We need to transform it first into pixels and then row. I got help from someone :

“When R handles indexing for arrays or matrices, it assumes that the ordering as primarily on the columns. You, however, appear to want to create a 25 x 25 matrix-slice based on successive rows of that larger matrix, so the first thing to do is transpose so the row values are in columns columns:”

— IRFTM - Stackoverflow
# transpose the matrix array
train_x_img <- t(train_x_2)  
test_x_img <- t(test_x_2)
dim(train_x_img)
#> [1]   784 27455
dim(test_x_img)
#> [1]  784 7172

Fold the first dimension into a square of 28x28. I also add one more value on the last as information of grayscale on the matrix so it can be received by Model CNN. CNN only receive Convolution2D Input Shape (n_samples, height, width, channels)

#fold the 1D, add one dimension more with value = 1
dim(train_x_img) <- c(28,28, 27455, 1)
dim(test_x_img) <- c(28,28, 7172, 1)
dim(train_x_img)
#> [1]    28    28 27455     1
dim(test_x_img)
#> [1]   28   28 7172    1

Switch the position of Matrix Array so it can be compatible with CNN Input Shape.

train_x_img <- aperm(train_x_img, c(3, 1,2, 4))
test_x_img <- aperm(test_x_img, c(3, 1,2, 4))
dim(train_x_img)
#> [1] 27455    28    28     1
dim(test_x_img)
#> [1] 7172   28   28    1

Set the target_size and batch_size so it can be uniform along the codes without manually input the number. For easier tweaking of hyper-parameter too.

# Desired height and width of images
target_size <- c(28, 28)

# Batch size for training the model
batch_size <- 128

Further, we are using image data generator for feeding the images to the model learning, and further enrich our dataset using Image Augmentation and improve the model performance.

Image Augmentation

Image Augmentation

The image_data_generator() function allows you to specify a set of image processing and augmentation techniques to be applied to the input images. This can include techniques such as resizing, cropping, and normalization, as well as more advanced techniques such as horizontal flipping, rotation, and color shifting.

By using it, you can easily create a data generator that can be used to feed images to your model in small batches during training. The generator will apply the specified image processing and augmentation techniques to each batch of images on the fly, allowing you to train your model with a virtually infinite amount of augmented data.

Overall, the image_data_generator() function is a powerful tool for preparing image data for training deep learning models, and can greatly improve the performance and generalization of your model.

# Image Generator
train_data_gen <- image_data_generator(featurewise_center = FALSE,
                                       samplewise_center = FALSE, 
                                       featurewise_std_normalization = FALSE, 
                                       samplewise_std_normalization = FALSE,
                                       zca_whitening = FALSE,
                                       rotation_range = 10,
                                       zoom_range = 0.1,
                                       width_shift_range=0.1,
                                       height_shift_range=0.1,
                                       horizontal_flip=FALSE,
                                       vertical_flip=FALSE,
                                       rescale=1/255)

test_data_gen <- image_data_generator(featurewise_center = FALSE,
                                       samplewise_center = FALSE, 
                                       featurewise_std_normalization = FALSE, 
                                       samplewise_std_normalization = FALSE,
                                       zca_whitening = FALSE,
                                       rotation_range = 10,
                                       zoom_range = 0.1,
                                       width_shift_range=0.1,
                                       height_shift_range=0.1,
                                       horizontal_flip=FALSE,
                                       vertical_flip=FALSE,
                                       rescale=1/255)

Create Image Array Generator for processing matrix array of pixels into square of image that can be read by the Model.

# Training Dataset
train_image_array_gen <- flow_images_from_data(train_x_img,
                                               y = train_y_2, 
                                               seed = 123,
                                               #save_to_dir = "data-output/training", #only use it for testing purpose, see if the image is resulted right
                                               generator = train_data_gen)

# Validation Dataset
val_image_array_gen <- flow_images_from_data(test_x_img,
                                             y = test_y_2,
                                             batch_size = batch_size,
                                             seed = 123,
                                             #save_to_dir = "data-output/validation", 
                                             generator = test_data_gen)

Get the number of samples for both training and validation data.

# Number of training samples
train_samples <- train_image_array_gen$n

# Number of validation samples
valid_samples <- val_image_array_gen$n 

4.2 Modelling the CNN

We make only one model for the CNN method since it takes too long to knit. This model is a result of tweaking hyperparameters and re-learning the steps and architecture. I am also inspired by many already published models from others for putting parameters and layers in this architecture.

tensorflow::tf$random$set_seed(123)

model_CNN <- keras_model_sequential() %>% 
  
  # First convolutional layer
  layer_conv_2d(filters = 75, #number filter is usually started big and then reduced accordingly layer after layer
                kernel_size = c(3,3), # 6 x 6 filters
                strides = 1, 
                padding = "same",
                activation = "relu", #the same with 1st method, because the data is images
                input_shape = c(target_size, 1)
                ) %>% 
  
  # Layer Normalization
  layer_batch_normalization() %>% #normalization the result from 1st convolutional layer

  # Max pooling layer
  layer_max_pooling_2d(pool_size = c(2,2), #pooling with 2x2 size frame
                       strides = 2,
                       padding = "same") %>% 
    
  # Second convolutional layer
  layer_conv_2d(filters = 50,
                kernel_size = c(3,3), # 6 x 6 filters
                strides = 1, 
                padding = "same",
                activation = "relu"
                ) %>% 
  
  # Layer Drop Out
  layer_dropout(rate = 0.2) %>% 
  # a dropout rate of 0.2 means that 20% of the activations will be set to zero during each training iteration
    
  # Layer Normalization
  layer_batch_normalization() %>% 
    
  # Max pooling layer
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = 2,
                       padding = "same") %>% 
    
  # Third convolutional layer
  layer_conv_2d(filters = 25,
                kernel_size = c(3,3), # 6 x 6 filters
                strides = 1, 
                padding = "same",
                activation = "relu"
                ) %>% 
  
  # Layer Normalization
  layer_batch_normalization() %>%  

  # Max pooling layer
  layer_max_pooling_2d(pool_size = c(2,2),
                       strides = 2,
                       padding = "same") %>% 
  
  # Flattening layer
  layer_flatten() %>% 
  
  # Dense layer
  layer_dense(units = 512,
              activation = "relu") %>% 
  
  # Layer Drop Out
  layer_dropout(rate = 0.3) %>% 
  
  # Output layer
  layer_dense(name = "Output",
              units = 24, 
              activation = "softmax") #multiclass case

model_CNN
#> Model: "sequential"
#> _____________________________________________________________________
#> Layer (type)                   Output Shape               Param #    
#> =====================================================================
#> conv2d_2 (Conv2D)              (None, 28, 28, 75)         750        
#> _____________________________________________________________________
#> batch_normalization_2 (BatchNo (None, 28, 28, 75)         300        
#> _____________________________________________________________________
#> max_pooling2d_2 (MaxPooling2D) (None, 14, 14, 75)         0          
#> _____________________________________________________________________
#> conv2d_1 (Conv2D)              (None, 14, 14, 50)         33800      
#> _____________________________________________________________________
#> dropout_1 (Dropout)            (None, 14, 14, 50)         0          
#> _____________________________________________________________________
#> batch_normalization_1 (BatchNo (None, 14, 14, 50)         200        
#> _____________________________________________________________________
#> max_pooling2d_1 (MaxPooling2D) (None, 7, 7, 50)           0          
#> _____________________________________________________________________
#> conv2d (Conv2D)                (None, 7, 7, 25)           11275      
#> _____________________________________________________________________
#> batch_normalization (BatchNorm (None, 7, 7, 25)           100        
#> _____________________________________________________________________
#> max_pooling2d (MaxPooling2D)   (None, 4, 4, 25)           0          
#> _____________________________________________________________________
#> flatten (Flatten)              (None, 400)                0          
#> _____________________________________________________________________
#> dense (Dense)                  (None, 512)                205312     
#> _____________________________________________________________________
#> dropout (Dropout)              (None, 512)                0          
#> _____________________________________________________________________
#> Output (Dense)                 (None, 24)                 12312      
#> =====================================================================
#> Total params: 264,049
#> Trainable params: 263,749
#> Non-trainable params: 300
#> _____________________________________________________________________

sparse_categorical_crossentropy is more efficient to compute and is preferred when the number of classes is large, as it does not require the one-hot encoding of the true output data. categorical_crossentropy is preferred when the number of classes is small and the one-hot encoding is more practical.

model_CNN %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = optimizer_adam(lr = 0.0001),
    metrics = "accuracy"
  )

learning_rate_reduction <- callback_reduce_lr_on_plateau(
  monitor = "val_accuracy",
  patience = 2,
  verbose = 1,
  factor = 0.5,
  min_lr = 0.00001
)
history4 <- model_CNN %>% 
  
  fit_generator(
  # training data
  train_image_array_gen,
  
  # epochs
  steps_per_epoch = as.integer(train_samples / batch_size), 
  epochs = 30, 
  
  # validation data
  validation_data = val_image_array_gen,
  validation_steps = as.integer(valid_samples / batch_size),
  
  # print progress but don't create graphic
  verbose = 1#,
  #callbacks = list(learning_rate_reduction)
  
)
hist4 <- plot(history4)
ggsave("data-output/plot/hist4.png", hist4)
history4

history4

Insight :

  • The Result is significantly higher than the traditional Neural Network, it proves that the CNN could learn dataset of images better.
  • There is no overfitting potential in this model.

5 Cross Validation

First we load the model, I save it because the time to compile and fitting the model is too long for me to wait.

# load model
model_CNN <- load_model_tf("data-output/model_CNN")
model_1 <- load_model_tf("data-output/model_NN1")
model_2 <- load_model_tf("data-output/model_NN2")

Create prediction value on Test Dataset.

#predict for different model
pred_test_cnn <- predict_classes(model_CNN, test_x_img/255) 
pred_test_nn1 <- predict_classes(model_1, test_x)
pred_test_nn2 <- predict_classes(model_2, test_x)

Convert actual label into array so it can be processed by ConfusionMatrix.

actual_test <- as.array(test_hand$label)
class(actual_test)
#> [1] "array"

Create decode function to convert encoding 0 ~ 23 into an alphabet represented by the image.

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "A",
            x == 1 ~ "B",
            x == 2 ~ "C",
            x == 3 ~ "D",
            x == 4 ~ "E",
            x == 5 ~ "F",
            x == 6 ~ "G",
            x == 7 ~ "H",
            x == 8 ~ "I",
            x == 9 ~ "K",
            x == 10 ~ "L",
            x == 11 ~ "M",
            x == 12 ~ "N",
            x == 13 ~ "O",
            x == 14 ~ "P",
            x == 15 ~ "Q",
            x == 16 ~ "R",
            x == 17 ~ "S",
            x == 18 ~ "T",
            x == 19 ~ "U",
            x == 20 ~ "V",
            x == 21 ~ "W",
            x == 22 ~ "X",
            x == 23 ~ "Y"
            )
}
#decode the result and actual label
pred_test_cnn <- sapply(pred_test_cnn, decode) 
pred_test_nn1 <- sapply(pred_test_nn1, decode) 
pred_test_nn2 <- sapply(pred_test_nn2, decode)
actual_test <- sapply(actual_test, decode)

5.1 Model CNN

confusionMatrix(as.factor(pred_test_cnn), 
                as.factor(actual_test)
                )
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction   A   B   C   D   E   F   G   H   I   K   L   M   N   O   P   Q   R
#>          A 331   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          B   0 432   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          C   0   0 310   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          D   0   0   0 245   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          E   0   0   0   0 498   0   0   0   0   0   0   0   0   0   0   0   0
#>          F   0   0   0   0   0 247   0   0   0   0   0   0   0   0   0   0   0
#>          G   0   0   0   0   0   0 348   0   0   0   0   0   0   0   0   0   0
#>          H   0   0   0   0   0   0   0 436   0   0   0   0   0   0   0   0   0
#>          I   0   0   0   0   0   0   0   0 288   0   0   0   0   0   0   0   0
#>          K   0   0   0   0   0   0   0   0   0 331   0   0   0   0   0   0   0
#>          L   0   0   0   0   0   0   0   0   0   0 209   0   0   0   0   0   0
#>          M   0   0   0   0   0   0   0   0   0   0   0 394   0   0   0   0   0
#>          N   0   0   0   0   0   0   0   0   0   0   0   0 291   0   0   0   0
#>          O   0   0   0   0   0   0   0   0   0   0   0   0   0 246   0   0   0
#>          P   0   0   0   0   0   0   0   0   0   0   0   0   0   0 347   0   0
#>          Q   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 164   0
#>          R   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 144
#>          S   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          T   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          U   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          V   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          W   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          X   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          Y   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
#>           Reference
#> Prediction   S   T   U   V   W   X   Y
#>          A   0   0   0   0   0   0   0
#>          B   0   0   0   0   0   0   0
#>          C   0   0   0   0   0   0   0
#>          D   0   0   0   0   0   0   0
#>          E   9   0   0   0   0   0   0
#>          F   0   0   0   0   0   0   0
#>          G   0   0   0   0   0   0   0
#>          H   0   1   0   0   0   0   0
#>          I   0   0   0   0   0   0   0
#>          K   0   0   0   0   0   0   0
#>          L   0   0   0   0   0   0   0
#>          M   0   0   0   0   0   0   0
#>          N   0   0   0   0   0   0   0
#>          O   0   0   0   0   0   0   0
#>          P   0   0   0   0   0   0   0
#>          Q   0   0   0   0   0   0   0
#>          R   0   0   0   0   0   0   0
#>          S 237   0   0   0   0   0   0
#>          T   0 247   0   0   0   0   0
#>          U   0   0 266   0   0   0   0
#>          V   0   0   0 346   0   0  20
#>          W   0   0   0   0 206   0   0
#>          X   0   0   0   0   0 267   0
#>          Y   0   0   0   0   0   0 312
#> 
#> Overall Statistics
#>                                          
#>                Accuracy : 0.9958         
#>                  95% CI : (0.994, 0.9972)
#>     No Information Rate : 0.0694         
#>     P-Value [Acc > NIR] : < 2.2e-16      
#>                                          
#>                   Kappa : 0.9956         
#>                                          
#>  Mcnemar's Test P-Value : NA             
#> 
#> Statistics by Class:
#> 
#>                      Class: A Class: B Class: C Class: D Class: E Class: F
#> Sensitivity           1.00000  1.00000  1.00000  1.00000  1.00000  1.00000
#> Specificity           1.00000  1.00000  1.00000  1.00000  0.99865  1.00000
#> Pos Pred Value        1.00000  1.00000  1.00000  1.00000  0.98225  1.00000
#> Neg Pred Value        1.00000  1.00000  1.00000  1.00000  1.00000  1.00000
#> Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
#> Detection Rate        0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
#> Detection Prevalence  0.04615  0.06023  0.04322  0.03416  0.07069  0.03444
#> Balanced Accuracy     1.00000  1.00000  1.00000  1.00000  0.99933  1.00000
#>                      Class: G Class: H Class: I Class: K Class: L Class: M
#> Sensitivity           1.00000  1.00000  1.00000  1.00000  1.00000  1.00000
#> Specificity           1.00000  0.99985  1.00000  1.00000  1.00000  1.00000
#> Pos Pred Value        1.00000  0.99771  1.00000  1.00000  1.00000  1.00000
#> Neg Pred Value        1.00000  1.00000  1.00000  1.00000  1.00000  1.00000
#> Prevalence            0.04852  0.06079  0.04016  0.04615  0.02914  0.05494
#> Detection Rate        0.04852  0.06079  0.04016  0.04615  0.02914  0.05494
#> Detection Prevalence  0.04852  0.06093  0.04016  0.04615  0.02914  0.05494
#> Balanced Accuracy     1.00000  0.99993  1.00000  1.00000  1.00000  1.00000
#>                      Class: N Class: O Class: P Class: Q Class: R Class: S
#> Sensitivity           1.00000   1.0000  1.00000  1.00000  1.00000  0.96341
#> Specificity           1.00000   1.0000  1.00000  1.00000  1.00000  1.00000
#> Pos Pred Value        1.00000   1.0000  1.00000  1.00000  1.00000  1.00000
#> Neg Pred Value        1.00000   1.0000  1.00000  1.00000  1.00000  0.99870
#> Prevalence            0.04057   0.0343  0.04838  0.02287  0.02008  0.03430
#> Detection Rate        0.04057   0.0343  0.04838  0.02287  0.02008  0.03305
#> Detection Prevalence  0.04057   0.0343  0.04838  0.02287  0.02008  0.03305
#> Balanced Accuracy     1.00000   1.0000  1.00000  1.00000  1.00000  0.98171
#>                      Class: T Class: U Class: V Class: W Class: X Class: Y
#> Sensitivity           0.99597  1.00000  1.00000  1.00000  1.00000  0.93976
#> Specificity           1.00000  1.00000  0.99707  1.00000  1.00000  1.00000
#> Pos Pred Value        1.00000  1.00000  0.94536  1.00000  1.00000  1.00000
#> Neg Pred Value        0.99986  1.00000  1.00000  1.00000  1.00000  0.99708
#> Prevalence            0.03458  0.03709  0.04824  0.02872  0.03723  0.04629
#> Detection Rate        0.03444  0.03709  0.04824  0.02872  0.03723  0.04350
#> Detection Prevalence  0.03444  0.03709  0.05103  0.02872  0.03723  0.04350
#> Balanced Accuracy     0.99798  1.00000  0.99854  1.00000  1.00000  0.96988

5.2 Model Neural Network 1st Iteration

confusionMatrix(as.factor(pred_test_nn1), 
                as.factor(actual_test)
                )
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction   A   B   C   D   E   F   G   H   I   K   L   M   N   O   P   Q   R
#>          A 308   0   0   0   0   0   0   0   5   0   0   0  45   0   0   0   0
#>          B   0 387   0  10   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          C   0   0 268   0   0   2   0   5   0   0  18   0   0  22   0   0   0
#>          D   0  17   0 209   0   0   0   0   0   0   0   0  13   0   0   0   0
#>          E   0   0   0   0 434   0   0  21   0   0   0  81  32  19   0   0   0
#>          F   0   0  21   0   0 208   0   0  20  63   0   0   0  22   0  19   0
#>          G   0   0   0   0   0   3 243  35   0   0  21   0   0   0   0   0   0
#>          H   0   0   0   0   0   0  39 375   0   0   0   0   0   0   0   0   0
#>          I   1   0   0   0   0   0   1   0 203   0   0   2   0   0   9   0   0
#>          K   0   0   0   0   0   4   0   0   0 105   0   0   0   0   0   0   0
#>          L   0   0   0   0   0   0   0   0   0   0 157   0   0   0   0   0   0
#>          M   0   0   0   0   1   0   2   0   0  20   0 190  36   0   0   1   0
#>          N   0   0   0   2   0   0   0   0   0   0   0  21 131   0   0   0   0
#>          O  22   0  21   0   0   0   0   0   0   0   0   5   5 144   0   0   0
#>          P   0   0   0   0   0   0   0   0   0   0   0   0   0   0 328   0   0
#>          Q   0   0   0   0   0   0  41   0  10   0   0   3   5  12  10 126   0
#>          R   0  22   0   0   0   0   0   0  24  42   0   0   0   0   0   0  58
#>          S   0   0   0   0  63   0   0   0   0  21   0  92   5   0   0  18   2
#>          T   0   0   0   0   0  19  22   0  11   0   0   0  16  27   0   0   0
#>          U   0   0   0   0   0   0   0   0   0  19   0   0   0   0   0   0  55
#>          V   0   0   0   4   0   3   0   0   4   3   0   0   0   0   0   0  23
#>          W   0   6   0   0   0   8   0   0   1  58   6   0   0   0   0   0   6
#>          X   0   0   0  20   0   0   0   0   0   0   7   0   3   0   0   0   0
#>          Y   0   0   0   0   0   0   0   0  10   0   0   0   0   0   0   0   0
#>           Reference
#> Prediction   S   T   U   V   W   X   Y
#>          A   0   0   0   0   0   0   0
#>          B   0   0   0   0   0   0  18
#>          C   0   0   0   0   0   0   0
#>          D   0   0  18   0   0   0   0
#>          E  24   0   0   0   0   0   0
#>          F   0   0   0  20   0   0   0
#>          G   0   0   0   0   0   0   0
#>          H  20   0   0   2   0   0   0
#>          I  41  21   0   0   0   1  24
#>          K   0   0  21   1   0   0   0
#>          L   0   0   0   0   0   0  41
#>          M  41   0   0   0   0   0   0
#>          N  21   0   0   0   0   0   4
#>          O   0   0   0   0   0   0   0
#>          P   0   0   0  19   0   5   0
#>          Q   0   0   0   1   0   0   0
#>          R   0   0   0   0   6   0   0
#>          S  99   0   0   0   0  16  20
#>          T   0 164   0  23  20  21  21
#>          U   0   0 174   1  20   0   0
#>          V   0   2  32 162  10   0  40
#>          W   0   0  21 117 150  51   0
#>          X   0  61   0   0   0 173   0
#>          Y   0   0   0   0   0   0 164
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.6916          
#>                  95% CI : (0.6807, 0.7023)
#>     No Information Rate : 0.0694          
#>     P-Value [Acc > NIR] : < 2.2e-16       
#>                                           
#>                   Kappa : 0.6773          
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: A Class: B Class: C Class: D Class: E Class: F
#> Sensitivity           0.93051  0.89583  0.86452  0.85306  0.87149  0.84211
#> Specificity           0.99269  0.99585  0.99315  0.99307  0.97348  0.97617
#> Pos Pred Value        0.86034  0.93253  0.85079  0.81323  0.71031  0.55764
#> Neg Pred Value        0.99662  0.99334  0.99387  0.99479  0.99025  0.99426
#> Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
#> Detection Rate        0.04294  0.05396  0.03737  0.02914  0.06051  0.02900
#> Detection Prevalence  0.04992  0.05786  0.04392  0.03583  0.08519  0.05201
#> Balanced Accuracy     0.96160  0.94584  0.92883  0.92307  0.92248  0.90914
#>                      Class: G Class: H Class: I Class: K Class: L Class: M
#> Sensitivity           0.69828  0.86009  0.70486  0.31722  0.75120  0.48223
#> Specificity           0.99135  0.99094  0.98547  0.99620  0.99411  0.98510
#> Pos Pred Value        0.80464  0.86009  0.66997  0.80153  0.79293  0.65292
#> Neg Pred Value        0.98472  0.99094  0.98763  0.96790  0.99254  0.97035
#> Prevalence            0.04852  0.06079  0.04016  0.04615  0.02914  0.05494
#> Detection Rate        0.03388  0.05229  0.02830  0.01464  0.02189  0.02649
#> Detection Prevalence  0.04211  0.06079  0.04225  0.01827  0.02761  0.04057
#> Balanced Accuracy     0.84481  0.92552  0.84517  0.65671  0.87265  0.73367
#>                      Class: N Class: O Class: P Class: Q Class: R Class: S
#> Sensitivity           0.45017  0.58537  0.94524  0.76829 0.402778  0.40244
#> Specificity           0.99302  0.99235  0.99648  0.98830 0.986625  0.96578
#> Pos Pred Value        0.73184  0.73096  0.93182  0.60577 0.381579  0.29464
#> Neg Pred Value        0.97712  0.98538  0.99721  0.99454 0.987749  0.97850
#> Prevalence            0.04057  0.03430  0.04838  0.02287 0.020078  0.03430
#> Detection Rate        0.01827  0.02008  0.04573  0.01757 0.008087  0.01380
#> Detection Prevalence  0.02496  0.02747  0.04908  0.02900 0.021194  0.04685
#> Balanced Accuracy     0.72160  0.78886  0.97086  0.87830 0.694701  0.68411
#>                      Class: T Class: U Class: V Class: W Class: X Class: Y
#> Sensitivity           0.66129  0.65414  0.46821  0.72816  0.64794  0.49398
#> Specificity           0.97400  0.98624  0.98227  0.96067  0.98682  0.99854
#> Pos Pred Value        0.47674  0.64684  0.57244  0.35377  0.65530  0.94253
#> Neg Pred Value        0.98770  0.98667  0.97329  0.99170  0.98639  0.97599
#> Prevalence            0.03458  0.03709  0.04824  0.02872  0.03723  0.04629
#> Detection Rate        0.02287  0.02426  0.02259  0.02091  0.02412  0.02287
#> Detection Prevalence  0.04796  0.03751  0.03946  0.05912  0.03681  0.02426
#> Balanced Accuracy     0.81765  0.82019  0.72524  0.84441  0.81738  0.74626

5.3 Model Neural Network 2nd Iteration

confusionMatrix(as.factor(pred_test_nn2), 
                as.factor(actual_test)
                )
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction   A   B   C   D   E   F   G   H   I   K   L   M   N   O   P   Q   R
#>          A 331   0   0   0   0   0   0   0  39   0   0  18  63   0   0   0   0
#>          B   0 368   0   8   0   1   0   0   2   0   0   0   0   0   0   0   0
#>          C   0   0 289   0   0  20   0   0   0   0  12   0   0  28   0   0   0
#>          D   0  37   0 204   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          E   0   0   0   0 473   0   0   0   0   0   0  39   0  21   0   0   0
#>          F   0   0  21   0   0 225   0   0   0   5  27   0   0  14   5   0   0
#>          G   0   0   0   0   0   0 238  20   0   0   0   0   0   0   0   0   0
#>          H   0   0   0   0   0   0  42 396   0   0   0   0   0  20   0   0   0
#>          I   0   0   0   0   0   0   0   0 199  35   0   2   0   0   2   0   0
#>          K   0   0   0   0   0   0   0   0   2 210   0   0   0   0   0   0   0
#>          L   0   0   0   0   0   0   0   0   0   0 148   0   0   0   0   0  19
#>          M   0   0   0   0   0   0   0   0   0   0   0 262  39   0   0   2   0
#>          N   0   0   0   0   0   0   0  20   0   0   0  21 124   0   0  21   0
#>          O   0   0   0   0   0   0   2   0   0   0   0   0  42 162   0   0   0
#>          P   0   0   0   0   0   0   3   0   0   8   0   0   0   0 320   0   0
#>          Q   0   0   0   0   0   0  38   0   0   0   0   1  17   0   7 141   0
#>          R   0   0   0  17   0   0   0   0  19  15   0   0   0   0   0   0  71
#>          S   0   0   0   0  25   0   0   0   6   0   0  51   2   0   3   0  25
#>          T   0   0   0   0   0   1  20   0   0   0   0   0   4   1   0   0   0
#>          U   0  25   0  13   0   0   0   0   0  20   0   0   0   0  10   0   5
#>          V   0   0   0   0   0   0   5   0   0   2   3   0   0   0   0   0   4
#>          W   0   2   0   0   0   0   0   0   0  14   0   0   0   0   0   0   0
#>          X   0   0   0   3   0   0   0   0   0   0   0   0   0   0   0   0   0
#>          Y   0   0   0   0   0   0   0   0  21  22  19   0   0   0   0   0  20
#>           Reference
#> Prediction   S   T   U   V   W   X   Y
#>          A   0   0   0   0   0   0   0
#>          B   0   0   0   0  19   0  19
#>          C   0   0   0   0   0   0   0
#>          D   0   0   4   0   0   0   0
#>          E  21   0   0   0   0   0   0
#>          F   0  16   0  20   0   2   0
#>          G   0   3   0   0   0   0   0
#>          H  20   8   0   0   0   0   0
#>          I  21  19   6   0  21   0  43
#>          K   0   0  21  20   0   0  37
#>          L   0   2   0   0   0   8  18
#>          M  64   0   0   0   0   0   0
#>          N  16   0   0   0   0   0  16
#>          O   0   0   0   0   0   0   0
#>          P   0   0   0  20   0   0   0
#>          Q   0   0   0   0   0   0   0
#>          R   0   1  14   0   7  19  24
#>          S 104   0   0   0   0  42   5
#>          T   0 155   0   0   0  19  21
#>          U   0   0 160  19   0   0   0
#>          V   0   2  20 224   5   0   0
#>          W   0   2   0  29 154  19   0
#>          X   0  40   0   0   0 158   0
#>          Y   0   0  41  14   0   0 149
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.7341          
#>                  95% CI : (0.7237, 0.7443)
#>     No Information Rate : 0.0694          
#>     P-Value [Acc > NIR] : < 2.2e-16       
#>                                           
#>                   Kappa : 0.7216          
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: A Class: B Class: C Class: D Class: E Class: F
#> Sensitivity           1.00000  0.85185  0.93226  0.83265  0.94980  0.91093
#> Specificity           0.98246  0.99273  0.99126  0.99408  0.98786  0.98412
#> Pos Pred Value        0.73392  0.88249  0.82808  0.83265  0.85379  0.67164
#> Neg Pred Value        1.00000  0.99053  0.99692  0.99408  0.99622  0.99678
#> Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
#> Detection Rate        0.04615  0.05131  0.04030  0.02844  0.06595  0.03137
#> Detection Prevalence  0.06288  0.05814  0.04866  0.03416  0.07724  0.04671
#> Balanced Accuracy     0.99123  0.92229  0.96176  0.91337  0.96883  0.94752
#>                      Class: G Class: H Class: I Class: K Class: L Class: M
#> Sensitivity           0.68391  0.90826  0.69097  0.63444  0.70813  0.66497
#> Specificity           0.99663  0.98664  0.97836  0.98831  0.99325  0.98451
#> Pos Pred Value        0.91188  0.81481  0.57184  0.72414  0.75897  0.71390
#> Neg Pred Value        0.98408  0.99402  0.98696  0.98242  0.99126  0.98060
#> Prevalence            0.04852  0.06079  0.04016  0.04615  0.02914  0.05494
#> Detection Rate        0.03318  0.05521  0.02775  0.02928  0.02064  0.03653
#> Detection Prevalence  0.03639  0.06776  0.04852  0.04044  0.02719  0.05117
#> Balanced Accuracy     0.84027  0.94745  0.83466  0.81137  0.85069  0.82474
#>                      Class: N Class: O Class: P Class: Q Class: R Class: S
#> Sensitivity           0.42612  0.65854  0.92219  0.85976  0.49306  0.42276
#> Specificity           0.98634  0.99365  0.99546  0.99101  0.98349  0.97704
#> Pos Pred Value        0.56881  0.78641  0.91168  0.69118  0.37968  0.39544
#> Neg Pred Value        0.97599  0.98794  0.99604  0.99670  0.98955  0.97945
#> Prevalence            0.04057  0.03430  0.04838  0.02287  0.02008  0.03430
#> Detection Rate        0.01729  0.02259  0.04462  0.01966  0.00990  0.01450
#> Detection Prevalence  0.03040  0.02872  0.04894  0.02844  0.02607  0.03667
#> Balanced Accuracy     0.70623  0.82609  0.95882  0.92538  0.73828  0.69990
#>                      Class: T Class: U Class: V Class: W Class: X Class: Y
#> Sensitivity           0.62500  0.60150  0.64740  0.74757  0.59176  0.44880
#> Specificity           0.99047  0.98668  0.99399  0.99053  0.99377  0.97997
#> Pos Pred Value        0.70136  0.63492  0.84528  0.70000  0.78607  0.52098
#> Neg Pred Value        0.98662  0.98468  0.98234  0.99252  0.98436  0.97342
#> Prevalence            0.03458  0.03709  0.04824  0.02872  0.03723  0.04629
#> Detection Rate        0.02161  0.02231  0.03123  0.02147  0.02203  0.02078
#> Detection Prevalence  0.03081  0.03514  0.03695  0.03067  0.02803  0.03988
#> Balanced Accuracy     0.80773  0.79409  0.82070  0.86905  0.79277  0.71438

6 Conclustion

  • The Result for Model CNN is amazing, even the model could predict several classes with 0 false prediction.
  • This prove that this case is really suit the method of classification using machine learning by Convolutional Neural Network.
  • This prove that the matrix array of pixels provide more information instead of 1D data array of pixels.

7 Improvement and Opinion

  • We may be able to create very accurate model with very little error (0.0001% or smaller) if we provide the data with bigger and richer dataset.

  • For example : bigger size of image, sharper image, more uniform ambience light and illumination for the image.

  • More uniform dataset but could make the model not smart enough, so we have to take a look of what is the purpose of this model.

  • If we use the model for example for webcam or daily video/images of sign language taken place, we may want to teach the model with large variation of images.

#save the model so it can be load again
save_model_tf(model_CNN, filepath = "data-output/model_CNN")
save_model_tf(model_1, filepath = "data-output/model_NN1")
save_model_tf(model_2, filepath = "data-output/model_NN2")

8 Reference

  1. MNIST : American Hand Sign Dataset
  2. Stackoverflow : Changing 2D array into 3D
  3. Kaggle : CNN Using Keras - Python
  4. Asking ChatGPT for many explanation and Code-Tweaking-Correction