American Sign Language Recognition

Introduction

Communication is an important part of our lives. Deaf and dumb people being unable to speak and listen, experience a lot of problems while communicating with normal people. There are many ways by which people with these disabilities try to communicate. One of the most prominent ways is the use of sign language, i.e. hand gestures. It is necessary to develop an application for recognizing gestures and actions of sign language so that deaf and dumb people can communicate easily with even those who don’t understand sign language. The objective of this work is to take an elementary step in breaking the barrier in communication between the normal people and deaf and dumb people with the help of sign language.

American Sign Language (ASL) is a complete, natural language that has the same linguistic properties as spoken languages, with grammar that differs from English. ASL is expressed by movements of the hands and face. It is the primary language of many North Americans who are deaf and hard of hearing, and is used by many hearing people as well.

Import Data

Load library needed first :

library(keras)
library(dplyr)
library(caret)
library(kableExtra)

I will use sign-language-mnist dataset which can be downloaded on the following page. Data to download are sign-mnist-train.csv as train data and sign-mnist-test.csv as test date, both of the data stores sign language images measuring 28 x 28 pixels for 24 different categories.

train <- read.csv("data/sign_mnist_train.csv")
test <- read.csv("data/sign_mnist_test.csv")

Exploratory Data Analysis & Preprocessing

# Check data dimension 
dim(train)

[1] 27455   785

dim(test)

[1] 7172  785

# check picture pixel
sqrt(784)

[1] 28

The train data consists of 27455 observations and 785 variables (1 target and 784 predictors). Each predictor represent pixels of the image. Let’s check the category on the target variable in both train and test data by using the unique() function

sort(unique(train$label))

 [1]  0  1  2  3  4  5  6  7  8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

sort(unique(test$label))

 [1]  0  1  2  3  4  5  6  7  8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

We need to fix the categories on the target variable in both the train and test data. Since labels 9 and 25 are missing, we can subtract by 1 all labels greater than 9. In this way, our labels become all integers from 0 to 23. You can use the mutate() and ifelse() function to fix the category on the target variable in both train and test data.

train <- train %>% 
  mutate(label = ifelse(label > 9, label-1, label))

test <- test %>% 
  mutate(label = ifelse(label > 9, label-1, label))

# croscheck
sort(unique(train$label))

 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

sort(unique(test$label))

 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Let’s take a look at the first 32 pictures from train.

vizTrain <- function(input) {
  dimmax <- sqrt(ncol(train[, -1]))
  cols <- 8
  rows <- floor((nrow(input) - 1) / cols) + 1
  par(mfrow = c(rows, cols), mar = c(0.1, 0.1, 0.1, 0.1))
  for (i in 1:nrow(input)) {
    m1 <- matrix(input[i, 2:ncol(input)], nrow = dimmax, byrow = T)
    m1 <- apply(m1, 2, as.numeric)
    m1 <- t(apply(m1, 2, rev))
    image(1:dimmax, 1:dimmax, m1, col = grey.colors(255), xaxt = "n", yaxt = "n")
    text(3, 26, col = "black", cex = 1.2, train[i, 1])
  }
}

vizTrain(train[1:32, ])

We perform a grayscale normalization to reduce the effect of illumination’s difference. The data contains the value of pixels stored in a data.frame. However, we have to separates predictors and targets for train and test data. We can use select() function for separates predictors and targets on train and test data.

After that, we convert data into matrix before we create a model. We can convert the data into matrix format using data.matrix() function. Especially for predictor variables stored in train_x and test_x, we do features scaling by dividing with 255 in order to normalize the array value from 0 to 255 into 0 to 1.

# Predictor variables in `train`
train_x <- train %>% 
  select(-label) %>% 
  data.matrix()/255

# Predictor variables in `test`
test_x <- test %>% 
  select(-label) %>% 
  data.matrix()/255 

# Target variable in `train`
train_y <- train %>% 
  select(label) %>% 
  data.matrix() 

# Target variable in `test`
test_y <- test %>% 
  select(label) %>% 
  data.matrix()

Next, we have to convert the predictor matrix into an array form. We can use the array_reshape(data, dim(data)) function to convert the predictor matrix into an array.

# Predictor variables in `train_x`
train_x_array <- array_reshape(train_x, dim(train_x)) 

# Predictor variables in `test_x`
test_x_array <- array_reshape(test_x, dim(test_x))

NN models don’t recognize categorical features. For that reason, we need to do one-hot encoding for the labels train_y and test_y. Basically, what one-hot encoding does is to generate columns of ones and zeros for each category. So in our case, the result will be a matrix with 24 columns in which each rows will all have zero values except at one cell which has value of 1. The column at which this value 1 occured corresponds to the label that column represents. Let’s do one-hot encoding to the target variable (train_y) using to_categorical() function from keras and stored it as train_y_keras object.

# Target variable in `train_y`
train_y_keras <- train_y %>% 
  to_categorical(num_classes = 24)

head(train_y_keras)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,]    0    0    0    1    0    0    0    0    0     0     0     0     0     0
[2,]    0    0    0    0    0    0    1    0    0     0     0     0     0     0
[3,]    0    0    1    0    0    0    0    0    0     0     0     0     0     0
[4,]    0    0    1    0    0    0    0    0    0     0     0     0     0     0
[5,]    0    0    0    0    0    0    0    0    0     0     0     0     1     0
[6,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
[1,]     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0     0
[5,]     0     0     0     0     0     0     0     0     0     0
[6,]     0     1     0     0     0     0     0     0     0     0

test_y_keras <- test_y %>% 
  to_categorical(num_classes = 24)

head(test_y_keras)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,]    0    0    0    0    0    0    1    0    0     0     0     0     0     0
[2,]    0    0    0    0    0    1    0    0    0     0     0     0     0     0
[3,]    0    0    0    0    0    0    0    0    0     1     0     0     0     0
[4,]    1    0    0    0    0    0    0    0    0     0     0     0     0     0
[5,]    0    0    0    1    0    0    0    0    0     0     0     0     0     0
[6,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
[1,]     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0     0
[5,]     0     0     0     0     0     0     0     0     0     0
[6,]     0     0     0     0     0     0     1     0     0     0

Modeling

Before creating the model, I would like to introduce first about neural network algorithm which we used. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text or time series, must be translated.

Neural networks help us cluster and classify. You can think of them as a clustering and classification layer on top of the data you store and manage. They help to group unlabeled data according to similarities among the example inputs, and they classify data when they have a labeled dataset to train on. (Neural networks can also extract features that are fed to other algorithms for clustering and classification; so you can think of deep neural networks as components of larger machine-learning applications involving algorithms for reinforcement learning, classification and regression.)

Neural Network Elements

Deep learning is the name we use for “stacked neural networks”; that is, networks composed of several layers. The layers are made of nodes. A node is just a place where computation happens, loosely patterned on a neuron in the human brain, which fires when it encounters sufficient stimuli. A node combines input from the data with a set of coefficients, or weights, that either amplify or dampen that input, thereby assigning significance to inputs with regard to the task the algorithm is trying to learn; e.g. which input is most helpful is classifying data without error? These input-weight products are summed and then the sum is passed through a node’s so-called activation function, to determine whether and to what extent that signal should progress further through the network to affect the ultimate outcome, say, an act of classification. If the signals passes through, the neuron has been “activated. Here’s a diagram of what one node might look like.

A node layer is a row of those neuron-like switches that turn on or off as the input is fed through the net. Each layer’s output is simultaneously the subsequent layer’s input, starting from an initial input layer receiving your data.

Model Architecture

Ok, Now lets create the model. To organize the layers, we should create a base model, which is a sequential model. Call a keras_model_sequential() function, and please pipe the base model with the model architecture. To define the architecture for each layer, we will build several models by tuning several parameters. I will create 2 model with different layer in this modeling part, and I will compare the accuracy metric between the models.

first, create a model by defining two hidden layers, the following parameters as below : * the first layer contains 128 nodes, relu activation function, 784 input shape * the second layer contains 64 nodes, relu activation function * the third layer contains 24 nodes, softmax activation function

tensorflow::tf$random$set_seed(123)

model_1 <- keras_model_sequential(name = "Model_1") %>% 
  layer_dense(input_shape = ncol(train_x_array),
              units = 128, 
              activation = "relu", 
              name = "hidden_1" ) %>% 
    layer_dense(units = 64,
              activation = "relu",
              name = "hidden_2") %>% 
  layer_dense(units = 24,
              activation = "softmax",
              name = "output") 

model_1

Model
Model: "Model_1"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
hidden_1 (Dense)                    (None, 128)                     100480      
________________________________________________________________________________
hidden_2 (Dense)                    (None, 64)                      8256        
________________________________________________________________________________
output (Dense)                      (None, 24)                      1560        
================================================================================
Total params: 110,296
Trainable params: 110,296
Non-trainable params: 0
________________________________________________________________________________

second, create a model by defining three hidden layers, the following parameters as below : * the first layer contains 512 nodes, relu activation function, 784 input shape * the second layer contains 256 nodes, relu activation function * the third layer contains 128 nodes, relu activation function * the fourth layer contains 24 nodes, softmax activation function

tensorflow::tf$random$set_seed(123)

model_2 <- keras_model_sequential(name = "Model_2") %>% 
  layer_dense(input_shape = ncol(train_x_array),
              units = 512, 
              activation = "relu", 
              name = "hidden_1" ) %>% 
  layer_dense(units = 256,
              activation = "relu",
              name = "hidden_2") %>% 
  layer_dense(units = 128,
              activation = "relu",
              name = "hidden_3") %>% 
    layer_dense(units = 24,
              activation = "softmax",
              name = "output") 

model_2

Model
Model: "Model_2"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
hidden_1 (Dense)                    (None, 512)                     401920      
________________________________________________________________________________
hidden_2 (Dense)                    (None, 256)                     131328      
________________________________________________________________________________
hidden_3 (Dense)                    (None, 128)                     32896       
________________________________________________________________________________
output (Dense)                      (None, 24)                      3096        
================================================================================
Total params: 569,240
Trainable params: 569,240
Non-trainable params: 0
________________________________________________________________________________

Model Fitting & Evaluation

Using the compile() function we can configure the model and specify the following parameters:

Loss Function - categorical_crossentropy is used because each image can only belong to one category, and for multilabel classification.
Optimizer - optimizer_adam() I will use adam optimizer with learning rate of 0.001 which is found to be the optimal learning rate.
Metrics : here we specify that we want the model evaluated for accuracy of categorization.
Check model_1

model_1 %>% 
    compile(loss = "categorical_crossentropy", 
          optimizer = optimizer_adam(learning_rate = 0.001),   
          metrics = "accuracy")

summary(model_1)

Model: "Model_1"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
hidden_1 (Dense)                    (None, 128)                     100480      
________________________________________________________________________________
hidden_2 (Dense)                    (None, 64)                      8256        
________________________________________________________________________________
output (Dense)                      (None, 24)                      1560        
================================================================================
Total params: 110,296
Trainable params: 110,296
Non-trainable params: 0
________________________________________________________________________________

history_1 <- model_1 %>% 
  fit(train_x_array, 
      train_y_keras, 
      batch_size = 32, 
      epoch = 10,
      validation_data = list(test_x_array, test_y_keras))

plot(history_1)

pred_1 <- predict_classes(object = model_1, test_x_array)
conf_1 <- confusionMatrix(as.factor(pred_1), as.factor(test_y))
conf_1

Confusion Matrix and Statistics

          Reference
Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
        0  331   0   0   0   0   0   0   0  40   0   0   9  63   0   0   0   0
        1    0 369   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        2    0   0 289   0   0   2   2   0   0   0   0   0   0  23  21   0   0
        3    0  37   0 220   0   0   0   0  21   0   0   0   2   0   0   0   0
        4    0   0   0   0 476   0   0  21   0   0   0   0   4  19   0   0   0
        5    0   0  21   0   0 241   0   0   0  60   0   0   0   2   0   2   0
        6    0   0   0   0   0   0 273  20   0   0   0   0   5  20   0   0  18
        7    0   0   0   0   0   0  14 382   0   0   0   0   0   0   0   0   0
        8    0   0   0   0   0   0   0   1 206  17   0   0  11   0  17   0   0
        9    0   6   0   0   0   0   0   0   0 159   0   0   0   0   0   0   0
        10   0   0   0   0   0   0   0   0   0   0 188   0   0   0   0   0   0
        11   0   0   0   0   1   0  21   0   0   0   0 318  13   0   0  21   0
        12   0   0   0   0   0   0   0   0   0   0   0  20 105   0   0   0   0
        13   0   0   0   0   0   0   0   0   0   0   0   0  21 161   0   0   0
        14   0   0   0   0   0   0   1   0   0   0   0   0   0   0 300   0   0
        15   0   0   0   0   0   0  18   0   0   0   0   4  23   2   9 141   0
        16   0   6   0   4   0   0   0   0   0  30   0   0   0   0   0   0 103
        17   0   0   0   0  21   0   0   0   0   0   0  43  32   0   0   0   0
        18   0   0   0   1   0   4  17  11   0   0  19   0  12   0   0   0   1
        19   0  14   0   0   0   0   0   1   0   0   0   0   0   0   0   0   1
        20   0   0   0   0   0   0   0   0   0  23   0   0   0   0   0   0  21
        21   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        22   0   0   0  20   0   0   2   0   0   0   2   0   0  19   0   0   0
        23   0   0   0   0   0   0   0   0  21  42   0   0   0   0   0   0   0
          Reference
Prediction  17  18  19  20  21  22  23
        0   17   0   0   0   0   0   0
        1    0   0   0   0   0   0   0
        2    0   5   0   0   0   0   0
        3   20   0  29   0   0   0   0
        4   21   0   0   0   0   0   0
        5    0   0   0  20   0  21   0
        6    0   0   0   0   0   0   0
        7   18   0   0   0   0   0   0
        8   25   0   0   0   0   0  29
        9    0   0  21   0  20   0  13
        10   0   0   0   0   0   8  41
        11  47   0   0   0   0   0   0
        12  18   0   0   0   0   0   0
        13   0   0   0   0   0   0   0
        14   0   0   0   0   0   0   0
        15   0   0   0   1   0   0   0
        16   0   0  18   0   0   0   0
        17  79   0   0   0   0   0   3
        18   0 181   0  24  21   1  21
        19   0   0 106   0  17   0   0
        20   0   0  50 244  44   0  19
        21   0   0   0  20 104  42   0
        22   0  62   0  19   0 195   0
        23   1   0  42  18   0   0 206

Overall Statistics
                                               
               Accuracy : 0.7497               
                 95% CI : (0.7395, 0.7597)     
    No Information Rate : 0.0694               
    P-Value [Acc > NIR] : < 0.00000000000000022
                                               
                  Kappa : 0.738                
                                               
 Mcnemar's Test P-Value : NA                   

Statistics by Class:

                     Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
Sensitivity           1.00000  0.85417  0.93226  0.89796  0.95582  0.97571
Specificity           0.98114  1.00000  0.99228  0.98426  0.99026  0.98181
Pos Pred Value        0.71957  1.00000  0.84503  0.66869  0.87985  0.65668
Neg Pred Value        1.00000  0.99074  0.99693  0.99635  0.99668  0.99912
Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
Detection Rate        0.04615  0.05145  0.04030  0.03067  0.06637  0.03360
Detection Prevalence  0.06414  0.05145  0.04769  0.04587  0.07543  0.05117
Balanced Accuracy     0.99057  0.92708  0.96227  0.94111  0.97304  0.97876
                     Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
Sensitivity           0.78448  0.87615  0.71528  0.48036   0.89952   0.80711
Specificity           0.99077  0.99525  0.98547  0.99123   0.99296   0.98480
Pos Pred Value        0.81250  0.92271  0.67320  0.72603   0.79325   0.75534
Neg Pred Value        0.98903  0.99201  0.98806  0.97526   0.99697   0.98874
Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
Detection Rate        0.03806  0.05326  0.02872  0.02217   0.02621   0.04434
Detection Prevalence  0.04685  0.05772  0.04267  0.03054   0.03305   0.05870
Balanced Accuracy     0.88763  0.93570  0.85038  0.73580   0.94624   0.89596
                     Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
Sensitivity            0.36082   0.65447   0.86455   0.85976   0.71528
Specificity            0.99448   0.99697   0.99985   0.99187   0.99175
Pos Pred Value         0.73427   0.88462   0.99668   0.71212   0.63975
Neg Pred Value         0.97354   0.98784   0.99316   0.99670   0.99415
Prevalence             0.04057   0.03430   0.04838   0.02287   0.02008
Detection Rate         0.01464   0.02245   0.04183   0.01966   0.01436
Detection Prevalence   0.01994   0.02538   0.04197   0.02761   0.02245
Balanced Accuracy      0.67765   0.82572   0.93220   0.92581   0.85351
                     Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
Sensitivity            0.32114   0.72984   0.39850   0.70520   0.50485
Specificity            0.98571   0.98094   0.99522   0.97700   0.99110
Pos Pred Value         0.44382   0.57827   0.76259   0.60848   0.62651
Neg Pred Value         0.97612   0.99023   0.97725   0.98494   0.98544
Prevalence             0.03430   0.03458   0.03709   0.04824   0.02872
Detection Rate         0.01102   0.02524   0.01478   0.03402   0.01450
Detection Prevalence   0.02482   0.04364   0.01938   0.05591   0.02315
Balanced Accuracy      0.65342   0.85539   0.69686   0.84110   0.74798
                     Class: 22 Class: 23
Sensitivity            0.73034   0.62048
Specificity            0.98204   0.98187
Pos Pred Value         0.61129   0.62424
Neg Pred Value         0.98949   0.98158
Prevalence             0.03723   0.04629
Detection Rate         0.02719   0.02872
Detection Prevalence   0.04448   0.04601
Balanced Accuracy      0.85619   0.80118

Check model_2

model_2 %>% 
    compile(loss = "categorical_crossentropy", 
          optimizer = optimizer_adam(learning_rate = 0.001),   
          metrics = "accuracy")

summary(model_2)

Model: "Model_2"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
hidden_1 (Dense)                    (None, 512)                     401920      
________________________________________________________________________________
hidden_2 (Dense)                    (None, 256)                     131328      
________________________________________________________________________________
hidden_3 (Dense)                    (None, 128)                     32896       
________________________________________________________________________________
output (Dense)                      (None, 24)                      3096        
================================================================================
Total params: 569,240
Trainable params: 569,240
Non-trainable params: 0
________________________________________________________________________________

history_2 <- model_2 %>% 
  fit(train_x_array, 
      train_y_keras, 
      batch_size = 32, 
      epoch = 10,
      validation_data = list(test_x_array, test_y_keras))

plot(history_2)

pred_2 <- predict_classes(object = model_2, test_x_array)
conf_2 <- confusionMatrix(as.factor(pred_2), as.factor(test_y))
conf_2

Confusion Matrix and Statistics

          Reference
Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
        0  331   0   0   0   0   0   0   0   0   0   0  14  42   0   0   0   0
        1    0 394   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        2    0   0 310   0   0   1   2   0   0   0   0   0   0   0   0   0   0
        3    0  38   0 209   0   0   0   0   0   0   0   0   0   0   0   0   0
        4    0   0   0   0 498   0   0   0   0   0   0   0   0   0   0   0   0
        5    0   0   0   0   0 246   0   0   0  21   0   0   0   0   0   0   0
        6    0   0   0   0   0   0 266  38   0   0   0   0   0   0   0   0   0
        7    0   0   0   0   0   0  21 397   0   0   0   0   0   0   0   0   0
        8    0   0   0   0   0   0   0   0 241  21   0   1   0   0   0   0   0
        9    0   0   0   0   0   0   0   0   0 198   0   0   0   0   0   0   0
        10   0   0   0   0   0   0   0   0   0   0 188   0   0   0   0   0   0
        11   0   0   0   0   0   0   0   0   0   0   0 329  40   0   0   0   0
        12   0   0   0   0   0   0  20   0   0   0   0  21 167   0   0   0   0
        13   0   0   0   0   0   0   0   1   0   0   0   0  21 238   0   0   0
        14   0   0   0   0   0   0   0   0   0   0   0   0   0   0 347   0   0
        15   0   0   0   0   0   0  22   0   1   0   0   0  21   3   0 164   0
        16   0   0   0  16   0   0   0   0   2  20   0   0   0   0   0   0  62
        17   0   0   0   0   0   0   0   0   2   0   0  29   0   0   0   0   0
        18   0   0   0   0   0   0  14   0   0   0   0   0   0   5   0   0   0
        19   0   0   0   0   0   0   3   0   0  71   0   0   0   0   0   0  61
        20   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  21
        21   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        22   0   0   0  20   0   0   0   0  21   0  21   0   0   0   0   0   0
        23   0   0   0   0   0   0   0   0  21   0   0   0   0   0   0   0   0
          Reference
Prediction  17  18  19  20  21  22  23
        0    0   0   0   0   0   0   0
        1    0   0   0   0  11   0   0
        2    0   0   0   0   0   0   0
        3   21  18   0   0   0  20   0
        4   34   0   0   0   0   0   0
        5    0   0   0   1   0   0   0
        6    0   0   0   0   0   0   0
        7   19   0   0   0   0   0   0
        8    1   0   0   0   0   0  22
        9    0   0   8  10   4   0   0
        10   0  13   0   0   0   0   1
        11  48   0   0   0   0   0   0
        12   1   0   0   0   0   0   0
        13   0   7   0   0   0   0   0
        14   0   0   0  17   0   0   0
        15   0   0   0   0   0   0   0
        16   0   0   1   0   5   0   0
        17 122   0   0   0   0   0  20
        18   0 166   0  18   0   2   0
        19   0   0 218  10   8   0   0
        20   0   0  20 247   0   0   0
        21   0   0   0  43 178  56   0
        22   0  44   0   0   0 189   0
        23   0   0  19   0   0   0 289

Overall Statistics
                                               
               Accuracy : 0.8358               
                 95% CI : (0.827, 0.8443)      
    No Information Rate : 0.0694               
    P-Value [Acc > NIR] : < 0.00000000000000022
                                               
                  Kappa : 0.828                
                                               
 Mcnemar's Test P-Value : NA                   

Statistics by Class:

                     Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
Sensitivity           1.00000  0.91204  1.00000  0.85306  1.00000  0.99595
Specificity           0.99181  0.99837  0.99956  0.98600  0.99491  0.99682
Pos Pred Value        0.85530  0.97284  0.99042  0.68301  0.93609  0.91791
Neg Pred Value        1.00000  0.99438  1.00000  0.99476  1.00000  0.99986
Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
Detection Rate        0.04615  0.05494  0.04322  0.02914  0.06944  0.03430
Detection Prevalence  0.05396  0.05647  0.04364  0.04267  0.07418  0.03737
Balanced Accuracy     0.99591  0.95520  0.99978  0.91953  0.99745  0.99639
                     Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
Sensitivity           0.76437  0.91055  0.83681  0.59819   0.89952   0.83503
Specificity           0.99443  0.99406  0.99346  0.99678   0.99799   0.98702
Pos Pred Value        0.87500  0.90847  0.84266  0.90000   0.93069   0.78897
Neg Pred Value        0.98806  0.99421  0.99317  0.98087   0.99699   0.99038
Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
Detection Rate        0.03709  0.05535  0.03360  0.02761   0.02621   0.04587
Detection Prevalence  0.04239  0.06093  0.03988  0.03067   0.02817   0.05814
Balanced Accuracy     0.87940  0.95231  0.91513  0.79749   0.94876   0.91102
                     Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
Sensitivity            0.57388   0.96748   1.00000   1.00000  0.430556
Specificity            0.99390   0.99581   0.99751   0.99329  0.993739
Pos Pred Value         0.79904   0.89139   0.95330   0.77725  0.584906
Neg Pred Value         0.98219   0.99884   1.00000   1.00000  0.988395
Prevalence             0.04057   0.03430   0.04838   0.02287  0.020078
Detection Rate         0.02328   0.03318   0.04838   0.02287  0.008645
Detection Prevalence   0.02914   0.03723   0.05075   0.02942  0.014780
Balanced Accuracy      0.78389   0.98165   0.99875   0.99665  0.712147
                     Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
Sensitivity            0.49593   0.66935   0.81955   0.71387   0.86408
Specificity            0.99264   0.99437   0.97785   0.99399   0.98579
Pos Pred Value         0.70520   0.80976   0.58760   0.85764   0.64260
Neg Pred Value         0.98228   0.98823   0.99294   0.98562   0.99594
Prevalence             0.03430   0.03458   0.03709   0.04824   0.02872
Detection Rate         0.01701   0.02315   0.03040   0.03444   0.02482
Detection Prevalence   0.02412   0.02858   0.05173   0.04016   0.03862
Balanced Accuracy      0.74429   0.83186   0.89870   0.85393   0.92493
                     Class: 22 Class: 23
Sensitivity            0.70787   0.87048
Specificity            0.98465   0.99415
Pos Pred Value         0.64068   0.87842
Neg Pred Value         0.98866   0.99372
Prevalence             0.03723   0.04629
Detection Rate         0.02635   0.04030
Detection Prevalence   0.04113   0.04587
Balanced Accuracy      0.84626   0.93232

Comparing the metric accuracy from both model :

result1 <- data.frame(
  'train_acc' = round(tail(history_1$metrics$accuracy, n=1),4),
  'test_acc' = round(tail(history_1$metrics$val_accuracy, n=1),4), 
  row.names = 'model_1')
result2 <- data.frame(
  'train_acc' = round(tail(history_2$metrics$accuracy, n=1),4),
  'test_acc' = round(tail(history_2$metrics$val_accuracy, n=1),4), 
  row.names = 'model_2')

perf <- rbind(result1, result2)

kbl(perf) %>%
  kable_styling("striped",font_size = 14,full_width = F, position = "center")

	train_acc	test_acc
model_1	0.9702	0.7497
model_2	0.9757	0.8358

The value of accuracy above shows that the second model seems quite good in data test accuracy than the first model, and the second model also quite balance between train and test data, compare to the first one which show overfitting accuracy value between train and test.

Model Improvement

Data Augmentation

We can improve the second model, trying to get the best value of the accuracy. Here we will build artificial data using method called Image Augmentation. Image augmentation is one useful technique in building models that can increase the size of the training set without acquiring new images. The goal is that to teach the model not only with the original image but also the modification of the image, such as flipping the image, rotate it, zooming, crop the image, etc. This will create more robust model. We can do data augmentation by using the image data generator from keras.

To do image augmentation, we can fit the data into a generator. Here, I will create the image generator for keras with the following parameter:

Flip the image horizontally
Rotate the image from 0
Zoom in or zoom out by 20%
Randomly shift horizontally by 0.2 fraction of total width
Randomly shift horizontally by 0.2 fraction of total height

train_data_gen <- image_data_generator(
                                      horizontal_flip = T,
                                      rotation_range = 0,
                                      zoom_range = 0.2,
                                      width_shift_range = 0.2,
                                      height_shift_range = 0.2,
                                      fill_mode = 'nearest')

Now, since train_generator takes images as inputs, we need to reshape the array of inputs from 784 to (28, 28, 1). 1 indicate of grayscale image channel, but if we use RGB images the value will change into 3.

train_x_gen <- train_x_array %>% 
  array_reshape(dim = c(nrow(train_x), 28, 28, 1))

test_x_gen <- test_x_array %>% 
  array_reshape(dim = c(nrow(test_x), 28, 28, 1))

train_generator <- flow_images_from_data(
                    x = train_x_gen,
                    y = train_y_keras,
                    generator = train_data_gen,
                    batch_size = 32,
                    seed = 123,
)

Convolutional Neural Network

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics. The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.

In this case I want to improve the model using CNN architecture. We can start building the model architecture for the deep learning. Herewith the parameter of the model architecture improvement:

1st Convolutional layer to extract features from 2D image with 32 4x4 filters, relu activation function
Max pooling layer with size filter 3 x 3
2nd Convolutional layer to extract features from 2D image with 32 4x4 filters, relu activation function
Max pooling layer with size filter 3 x 3
3rd Convolutional layer to extract features from 2D image with 32 4x4 filters, relu activation function
Max pooling layer with size filter 3 x 3
Flattening layer from 2D array to 1D array
Dense layer to capture more information with 16 neurons and relu activation function
Dense layer for output layer with 24 neurons (because we have 24 categories) and a softmax activation function for probability output.

# create the model
tensorflow::tf$random$set_seed(123)

model_cnn <- keras_model_sequential(name = "Model_cnn") %>% 
  
  layer_conv_2d(filters = 32, 
                kernel_size = c(4,4), 
                padding = "same", activation = "relu",
                input_shape = c(28, 28, 1)
                ) %>% 
  
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  
  layer_conv_2d(filters = 32, 
                kernel_size = c(4,4), 
                padding = "same", activation = "relu",
                input_shape = c(28, 28, 1)
                ) %>% 
  
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  
  layer_conv_2d(filters = 32, 
                kernel_size = c(4,4), 
                padding = "same", activation = "relu",
                input_shape = c(28, 28, 1)
                ) %>% 
  
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  
  layer_flatten() %>% 
  
  layer_dense(units = 16, 
              activation = "relu") %>% 
 
  layer_dense(units = 24, 
              activation = "softmax",
              name = "Output"
              )

model_cnn

Model
Model: "Model_cnn"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
conv2d_2 (Conv2D)                   (None, 28, 28, 32)              544         
________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)      (None, 9, 9, 32)                0           
________________________________________________________________________________
conv2d_1 (Conv2D)                   (None, 9, 9, 32)                16416       
________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)      (None, 3, 3, 32)                0           
________________________________________________________________________________
conv2d (Conv2D)                     (None, 3, 3, 32)                16416       
________________________________________________________________________________
max_pooling2d (MaxPooling2D)        (None, 1, 1, 32)                0           
________________________________________________________________________________
flatten (Flatten)                   (None, 32)                      0           
________________________________________________________________________________
dense (Dense)                       (None, 16)                      528         
________________________________________________________________________________
Output (Dense)                      (None, 24)                      408         
================================================================================
Total params: 34,312
Trainable params: 34,312
Non-trainable params: 0
________________________________________________________________________________

# compile model
model_cnn %>% 
  compile(optimizer = optimizer_adam(learning_rate = 0.001),
          loss = "categorical_crossentropy",
          metrics = "accuracy")

In this case we use fit_generator() function and insert train_generator as generator for fitting the model. We also need to specify steps_per_epoch parameter which is just the number of steps within one epoch, that is, the number of all train observations divided by batch size. Lastly, we will train the model for 50 epochs to sqeeze out as many information as possible. But please note that too many epochs may also lead to overfitting and takes time to process.

# fitting the model
history_cnn <- model_cnn %>% 
  fit_generator(
    generator = train_generator,
    steps_per_epoch = nrow(train_x_gen) / 32,
    epoch = 50,
    validation_data = list(test_x_gen, test_y_keras))

plot(history_cnn)

# evaluate the model
pred_cnn <- predict_classes(object = model_cnn, x = test_x_gen)
conf_cnn <- confusionMatrix(as.factor(pred_cnn), as.factor(test_y))
conf_cnn

Confusion Matrix and Statistics

          Reference
Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
        0  331   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        1    0 415   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        2    0   0 310   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        3    0   0   0 184   0   0   0   0   0   0   1   0   0   0   0   0   0
        4    0   0   0   8 456   0   0   0   0   0   0   0   0   0   0   0   0
        5    0   0   0   0   0 247   0   0   0   0   0   0   0   0   0   0   0
        6    0   0   0   0   0   0 337  45   0   0   0   0   0   0   0   0   0
        7    0   0   0   0   0   0   0 379   0   0   0   0   0   0   0   0   0
        8    0   0   0   0   0   0   0   0 287   0   0   0   0   0   0   0   0
        9    0   0   0   0   0   0   0   0   0 331   0   0   0   0   0   0   0
        10   0   0   0   4   0   0   0   0   0   0 191   0   0   0   0   0   0
        11   0   0   0   0   0   0   0   0   0   0   0 235   0   0   0   0   0
        12   0   0   0   3  42   0   0   0   0   0   0 159 291   0   0   0   0
        13   0   0   0   0   0   0   0   0   0   0   0   0   0 246   0   0   0
        14   0   0   0   0   0   0   9  12   0   0   0   0   0   0 347   7   0
        15   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 157   0
        16   0   0   0  46   0   0   0   0   0   0   0   0   0   0   0   0 128
        17   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  13
        18   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        19   0   7   0   0   0   0   0   0   0   0   0   0   0   0   0   0   2
        20   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
        21   0  10   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        22   0   0   0   0   0   0   2   0   0   0   0   0   0   0   0   0   0
        23   0   0   0   0   0   0   0   0   1   0  17   0   0   0   0   0   0
          Reference
Prediction  17  18  19  20  21  22  23
        0    8   0   0   0   0   0   0
        1    0   0   0   0   0   0   0
        2    0   0   0   0   0   0   0
        3    0   0   0   0   0   0   0
        4    0   0   0   0   0   0   0
        5    0   0   0   0   0   0   0
        6    0   0   0   0   0   0   0
        7    0   0   0   0   0   0   0
        8    4   0   0   0   0   0   0
        9    0   0   0   1   0   0   0
        10   0   0   0   0   0   0   0
        11   1   0   0   0   0   0   0
        12   0   0   0   0   0   0   0
        13   0   0   0   0   0   0   0
        14   0   0   0   0   0   0   0
        15   0   0   0   0   0   0   0
        16   0   0   0   0   0   0   0
        17 233   0   0   0   0   0   0
        18   0 248   0   0   0   0   0
        19   0   0 266   0   0   0   0
        20   0   0   0 288   0   0   0
        21   0   0   0  57 206   0   0
        22   0   0   0   0   0 267   0
        23   0   0   0   0   0   0 332

Overall Statistics
                                               
               Accuracy : 0.9359               
                 95% CI : (0.9299, 0.9414)     
    No Information Rate : 0.0694               
    P-Value [Acc > NIR] : < 0.00000000000000022
                                               
                  Kappa : 0.9329               
                                               
 Mcnemar's Test P-Value : NA                   

Statistics by Class:

                     Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
Sensitivity           1.00000  0.96065  1.00000  0.75102  0.91566  1.00000
Specificity           0.99883  1.00000  1.00000  0.99986  0.99880  1.00000
Pos Pred Value        0.97640  1.00000  1.00000  0.99459  0.98276  1.00000
Neg Pred Value        1.00000  0.99748  1.00000  0.99127  0.99374  1.00000
Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
Detection Rate        0.04615  0.05786  0.04322  0.02566  0.06358  0.03444
Detection Prevalence  0.04727  0.05786  0.04322  0.02579  0.06470  0.03444
Balanced Accuracy     0.99942  0.98032  1.00000  0.87544  0.95723  1.00000
                     Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
Sensitivity           0.96839  0.86927  0.99653  1.00000   0.91388   0.59645
Specificity           0.99341  1.00000  0.99942  0.99985   0.99943   0.99985
Pos Pred Value        0.88220  1.00000  0.98625  0.99699   0.97949   0.99576
Neg Pred Value        0.99838  0.99161  0.99985  1.00000   0.99742   0.97708
Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
Detection Rate        0.04699  0.05284  0.04002  0.04615   0.02663   0.03277
Detection Prevalence  0.05326  0.05284  0.04057  0.04629   0.02719   0.03291
Balanced Accuracy     0.98090  0.93463  0.99797  0.99993   0.95665   0.79815
                     Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
Sensitivity            1.00000    1.0000   1.00000   0.95732   0.88889
Specificity            0.97035    1.0000   0.99590   1.00000   0.99345
Pos Pred Value         0.58788    1.0000   0.92533   1.00000   0.73563
Neg Pred Value         1.00000    1.0000   1.00000   0.99900   0.99771
Prevalence             0.04057    0.0343   0.04838   0.02287   0.02008
Detection Rate         0.04057    0.0343   0.04838   0.02189   0.01785
Detection Prevalence   0.06902    0.0343   0.05229   0.02189   0.02426
Balanced Accuracy      0.98518    1.0000   0.99795   0.97866   0.94117
                     Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
Sensitivity            0.94715   1.00000   1.00000   0.83237   1.00000
Specificity            0.99812   1.00000   0.99870   0.99985   0.99038
Pos Pred Value         0.94715   1.00000   0.96727   0.99654   0.75458
Neg Pred Value         0.99812   1.00000   1.00000   0.99157   1.00000
Prevalence             0.03430   0.03458   0.03709   0.04824   0.02872
Detection Rate         0.03249   0.03458   0.03709   0.04016   0.02872
Detection Prevalence   0.03430   0.03458   0.03834   0.04030   0.03806
Balanced Accuracy      0.97264   1.00000   0.99935   0.91611   0.99519
                     Class: 22 Class: 23
Sensitivity            1.00000   1.00000
Specificity            0.99971   0.99737
Pos Pred Value         0.99257   0.94857
Neg Pred Value         1.00000   1.00000
Prevalence             0.03723   0.04629
Detection Rate         0.03723   0.04629
Detection Prevalence   0.03751   0.04880
Balanced Accuracy      0.99986   0.99868

We can observe from the confusion matrix result, many classes can predict almost perfectly, only few classes still hard to differentiate.

resultcnn <- data.frame(
  'train_acc' = round(tail(history_cnn$metrics$accuracy, n=1),4),
  'test_acc' = round(tail(history_cnn$metrics$val_accuracy, n=1),4),
  row.names = 'model_cnn'
)

perfall <- rbind(perf, resultcnn)

kbl(perfall) %>%
  kable_styling("striped",font_size = 14,full_width = F, position = "center")

	train_acc	test_acc
model_1	0.9702	0.7497
model_2	0.9757	0.8358
model_cnn	0.9376	0.9359

From the output, metric accuracy model cnn value better than other models, we get 93.59% for the data test and 93.76% for data train. The accuracy value also quite balance between train and test dataset.

Conclusion

Neural Network (NN) is very suitable for image classification problem. This is because it’s hard to extract features from images manually and NN can do this internally without us worrying what features to be extracted. For our problem, we see that NN model alone may lead to overfitting. As shown in this case, data augmentation and CNN implementation for image classification is possible with few lines of code. Keras provides a simple way to implement multiple types of CNN architectures and facilitates easy fine tunning of hyperparameters so that models can be easily optimized and reduce overfitting. I can’t tells that this is the optimal model for this case, many things could still be improved , we can still try in obtaining a better model performance using other method or adjusting the network architecture.

Resources

Reference : Neural Network
Dataset: Kaggle: Sign Language MNIST