# chunk options
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE
)

1 Introduction

We need to build wer classification model to classify the categories of sign language image using Neural Network algorithm in keras framework by following these steps:

2 1 Data Preparation

Let us start our neural network experience by first preparing the dataset. We will use sign-language-mnist dataset which can be downloaded on the following page. Data to download are sign-mnist-train.csv as train data and sign-mnist-test.csv as test date. Both of the data stores sign language images measuring 28 x 28 pixels for 24 different categories.

1.1 Load the library and data

Please load the following package.

library(dplyr)
library(keras)
library(caret)

In this phase, please load and check the sign-mnist-train.csv dan sign-mnist-test.csv data, then store it as sign_train and sign_test.

# wer code here
sign_train <- read.csv("datasets/sign_mnist_train/sign_mnist_train.csv")
sign_test <-  read.csv("datasets/sign_mnist_test/sign_mnist_test.csv")

# wer code here (cek dimensi)
dim(sign_train)

## [1] 27455   785

dim(sign_test)

## [1] 7172  785

Inspect the sign_train data by using head() function.

# wer code here
head(sign_train)

head(sign_test)

The sign_train data consists of 27455 observations and 785 variables (1 target and 784 predictors). Each predictor represent pixels of the image.

1.2 Fixed category on target variable

Check the category on the target variable in both the sign_train and sign_test data by using the unique() function

# wer code here
unique(sign_train$label)

##  [1]  3  6  2 13 16  8 22 18 10 20 17 19 21 23 24  1 12 11 15  4  0  5  7 14

unique(sign_test$label)

##  [1]  6  5 10  0  3 21 14  7  8 12  4 22  2 15  1 13 19 18 23 17 20 16 11 24

We need to fix the categories on the target variable in both the sign_train and sign_test data. Since labels 9 and 25 are missing, we can subtract by 1 all labels greater than 9. In this way, our labels become all integers from 0 to 23. We can use the mutate() and ifelse( ) function to fix the category on the target variable in both sign_train and sign_test data.

Use the code below to fix the categories on the target variable in the sign_train and sign_test data.

sign_train <- sign_train %>% 
  mutate(label = ifelse(label > 9, label-1, label))

sign_test <- sign_test %>% 
  mutate(label = ifelse(label > 9, label-1, label))

# fungsi untuk visualisai data gambar dari csv
vizTrain <- function(input){
  
  dimmax <- sqrt(ncol(input[,-1]))
  
  dimn <- ceiling(sqrt(nrow(input)))
  par(mfrow=c(dimn, dimn), mar=c(.1, .1, .1, .1))
  
  for (i in 1:nrow(input)){
      m1 <- as.matrix(input[i,2:785])
      dim(m1) <- c(28,28)
      
      m1 <- apply(apply(m1, 1, rev), 1, t)
      
      image(1:28, 1:28, 
            m1, col=grey.colors(255), 
            # remove axis text
            xaxt = 'n', yaxt = 'n')
      text(2, 20, col="white", cex=1.2, input[i, 1])
  }
  
}

vizTrain(head(sign_train, 36))

1.3 Separates predictors and targets, converts data into matrix, and features scaling

The data contains the value of pixels stored in a data.frame. However, we have to separates predictors and targets for sign_train and sign_test data and store it as train_x, train_y, test_x, dan test_y. We can use select() function for separates predictors and targets on sign_train and sign_test data.

After that, convert train_x, train_y, test_x, dan test_y data into matrix before we create a model. Please convert the data into matrix format using data.matrix() function. Especially for predictor variables stored in train_x and test_x, do features scaling by dividing with 255.

# Predictor variables in `sign_train`
train_x <-  sign_train %>% 
  select(-label) %>% 
  as.matrix()/255

# Predictor variables in `sign_test`
test_x <- sign_test %>% 
  select(-label) %>% 
  as.matrix()/255

# Target variable in `sign_train`
train_y <- sign_train$label

# Target variable in `sign_test`
test_y <- sign_test$label

range(train_x)

## [1] 0 1

range(test_x)

## [1] 0 1

If we inspect an image in the training set, we will see that the pixel values fall in the range of 0 to 255. The purpose of dividing the values in the array by 255 is for Normalize the array value from 0 to 255 into 0 to 1

1.4 Converting matrix to array

Next, we have to convert the predictor matrix into an array form. We can use the array_reshape(data, dim(data)) function to convert the predictor matrix into an array.

# Predictor variables in `train_x`
train_x_array <- array_reshape(x = train_x, dim = dim(train_x)) 

# Predictor variables in `test_x`
test_x_array <- array_reshape(x = test_x, dim = dim(test_x))

We should also do one-hot encoding to the target variable (train_y) using to_categorical() function from keras and stored it as train_y_dummy object.

# Target variable in `train_y`
train_y_dummy <- to_categorical(train_y)

3 2 Build Neural Network Model

2.1 Build a model base using keras_model_sequential()

To organize the layers, we should create a base model, which is a sequential model. Call a keras_model_sequential() function, and please pipe the base model with the model architecture.

2.2 Building Architecture (define layers, neurons, and activation function)

To define the architecture for each layer, we will build several models by tuning several parameters.

First, create a model (stored it as model_base) by defining the following parameters:

the first layer contains 64 nodes, relu activation function, 784 input shape
the second layer contains 32 nodes, relu activation function
the third layer contains 24 nodes, softmax activation function

But before building the architecture, we set the randomness of our weight on first epoch with set_random_seed() from tensorflow. Make sure to run all of this chunk together to make it works.

input_dim <- dim(train_x_array)[2]

num_class <- length(unique(sign_train$label))

# wer code here
tensorflow::set_random_seed(8)

model_base <- keras_model_sequential(name = "Model-Base") %>% 
  
  # input layer + hidden layer 1
  layer_dense(input_shape = input_dim, # ukuran input (jumlah prediktor)
              units = 64, # jumlah node di layer ini 
              activation  = 'relu', # activation function
              name = 'hidden_1') %>%  
  
  # hidden layer 2
  layer_dense(units = 32,
              activation = 'relu',
              name = "hidden_2") %>% 
  
  # output layer
  layer_dense(units = num_class, 
              activation = 'softmax', 
              name='output')

summary(model_base)

## Model: "Model-Base"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  hidden_1 (Dense)                   (None, 64)                      50240       
##  hidden_2 (Dense)                   (None, 32)                      2080        
##  output (Dense)                     (None, 24)                      792         
## ================================================================================
## Total params: 53,112
## Trainable params: 53,112
## Non-trainable params: 0
## ________________________________________________________________________________

Second, create a model (stored it as model_bigger) by defining the following parameters:

the first layer contains 256 nodes, relu activation function, 784 input shape
the second layer contains 128 nodes, relu activation function
the third layer contains 64 nodes, relu activation function
the fourth layer contains 24 nodes, softmax activation function

# wer code here
tensorflow::set_random_seed(8)
model_bigger <- keras_model_sequential(name = "Model-Bigger") %>% 
  
  # input layer + hidden layer 1
  layer_dense(input_shape = input_dim, # ukuran input (jumlah prediktor)
              units =256, # jumlah node di layer ini 
              activation  = 'relu', # activation function
              name = 'hidden_1') %>%  
  
  # hidden layer 2
  layer_dense(units = 128,
              activation = 'relu',
              name = "hidden_2") %>% 
  
  # hidden layer 3
  layer_dense(units = 64,
              activation = 'relu',
              name = "hidden_3") %>% 
  
  # output layer
  layer_dense(units = num_class, 
              activation = 'softmax', 
              name='output')

summary(model_bigger)

## Model: "Model-Bigger"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  hidden_1 (Dense)                   (None, 256)                     200960      
##  hidden_2 (Dense)                   (None, 128)                     32896       
##  hidden_3 (Dense)                   (None, 64)                      8256        
##  output (Dense)                     (None, 24)                      1560        
## ================================================================================
## Total params: 243,672
## Trainable params: 243,672
## Non-trainable params: 0
## ________________________________________________________________________________

2.3 Building Architecture (define cost function and optimizer)

We still need to do several settings before training the model_base and model_bigger. We must compile the model by defining the loss function, optimizer type, and evaluation metrics. Please compile the model to model_base and model_bigger by setting these parameters:

categorical_crossentropy as the loss function
optimizer_adam as the optimizer with learning rate of 0.001
use accuracy as the evaluation metric

# wer code here
model_base %>% compile(
        loss = "categorical_crossentropy",
        optimizer = optimizer_adam(learning_rate = 0.001),
        metric = 'accuracy' 
)

# wer code here
model_bigger %>% compile(
        loss = "categorical_crossentropy",
        optimizer = optimizer_adam(learning_rate = 0.001),
        metric = 'accuracy' 
)

2.4 Fitting model in the training set (define epoch and batch size)

In this step, we fit our model using epoch = 10, batch_size = 150, and set parameter shuffle = F so that the samples in each batch are not taken randomly but sorted (sequence) for model_base and model_bigger. We can save the fit model results into history_base and history_bigger.

# wer code here
history_base <- model_base %>% fit(x = train_x_array,
                          y = train_y_dummy,
                          epochs = 10,
                          batch_size = 150,
                          shuffle = F,
                          verbose = 1)

plot(history_base)

# wer code here
history_bigger <- model_bigger %>% fit(x = train_x_array,
                          y = train_y_dummy,
                          epochs = 10,
                          batch_size = 150,
                          shuffle = F,
                          verbose = 1)

plot(history_base)

Note : In the fitting model above, the meaning of epoch=10, The model does the feed-forward - back-propagation for all batch 10 times

4 3 Predicting on the testing set

To evaluate the model performance in unseen data, we will predict the testing (test_x_.keras_array) data using the trained model. Please predict using predict() function and store it as pred_base and pred_bigger.

# wer code here
pred_base <- predict(object = model_base, x = test_x_array)  %>% 
  k_argmax() %>% 
  as.array() %>% 
  as.factor()

pred_bigger <- predict(object = model_bigger, x = test_x_array)  %>% 
  k_argmax() %>% 
  as.array() %>% 
  as.factor()

5 4 Evaluating the neural network model

4.1 Confusion Matrix (classification)

We can evaluate the model using several metrics. Check the accuracy by creating confusion matrix. We can use confusionMatrix() from caret package. Also do the explicit coercion as.factor if were data is not yet stored as factor.

# wer code here
confusionMatrix(data = pred_base, reference = as.factor(sign_test$label))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
##         0  323   0   0   0   0   0   0   0   3   0   0   2  61   0   0  21   0
##         1    0 307   0   0   0   0   0   0   8   0   0   0   0   0   0   0   0
##         2    0   0 251   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##         3    0   0   0 149   0   0   0   0   0   0   0   0   0   0   0   0   0
##         4    1   0   0   0 412   0   0  21   0   0   0  63  28  32   0   0   0
##         5    0   0  21   0   0 194   5   0   0   0   4   0   0  53   0   6   0
##         6    0   0  17   0   0   0 180  39   0   0   0   0   0  20   0   0   0
##         7    0   0   0   0   0   0  64 338   0   0   0   6   0  17   0   1   0
##         8    0   0   0   0   0   0  23   6 203  21   0   0   1   0   2   0   1
##         9    0  92   0   0   0   0   0   0   0 200   0   0   0   0   4   0   1
##         10   0   0   0   0   0   0   0   0   0   0 205   0   0   0   0   0  20
##         11   0   0   0   0   0   0   0   0   0   0   0 111   0   0   0   0   0
##         12   0   0   0   0   0   0  18   0   5   0   0  42  79  14   0  13   0
##         13   1   0   0   0   0   0   0   2   0   0   0  21  18  99   0   0   0
##         14   0   0   0   0   0   0  19   0   0   0   0   0   0   0 317   0   0
##         15   0   0   0   0   0   0   5   0   0   0   0  19  19   5   3 123   0
##         16   0   9   0  37   0   0   0   0   2  83   0   0   0   0   0   0  81
##         17   6  21   0   7  86   0   0   0  33   0   0 130  56   0   0   0  20
##         18   0   0  21   0   0  14  19  13   0   0   0   0  24   1   0   0   0
##         19   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##         20   0   3   0  18   0  18   0  17   0   0   0   0   0   3   0   0  21
##         21   0   0   0   0   0  21   0   0  12   0   0   0   0   0  16   0   0
##         22   0   0   0  34   0   0  15   0   1   0   0   0   5   2   5   0   0
##         23   0   0   0   0   0   0   0   0  21  27   0   0   0   0   0   0   0
##           Reference
## Prediction  17  18  19  20  21  22  23
##         0    0   0   0   0   0   0   0
##         1    0   0   0   0   6   0   0
##         2    0   0   0   0   0   0   0
##         3    0   0  14   0   0   0   0
##         4   21   0   0   0   0   0   0
##         5    0   0   0  20   0   0   0
##         6    0   0   0   0   0   0   0
##         7    0  21   0   0   0   0   1
##         8  114  22   0   0   0  22  82
##         9    0   0  88  38  28   0   0
##         10   0  27   0   0   0  21  41
##         11  43   0   0   0   0   0   0
##         12   6   0   0   0   0   0   0
##         13   0   0   0   0   0   0   0
##         14   0   9   0   0   0  18   0
##         15   0   0   0   0   0   0   0
##         16   0   3  74  72   4  12   0
##         17  62   0   0   0   0   0  40
##         18   0  81   0   0   0   0   0
##         19   0  13  21   0   0   0   0
##         20   0   8  67 195  89   7  21
##         21   0  13   0   1  79  44   0
##         22   0  51   0  20   0 143   0
##         23   0   0   2   0   0   0 147
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5996          
##                  95% CI : (0.5881, 0.6109)
##     No Information Rate : 0.0694          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5812          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity           0.97583  0.71065  0.80968  0.60816  0.82731  0.78543
## Specificity           0.98728  0.99792  1.00000  0.99798  0.97513  0.98426
## Pos Pred Value        0.78780  0.95639  1.00000  0.91411  0.71280  0.64026
## Neg Pred Value        0.99882  0.98175  0.99148  0.98630  0.98696  0.99228
## Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
## Detection Rate        0.04504  0.04281  0.03500  0.02078  0.05745  0.02705
## Detection Prevalence  0.05717  0.04476  0.03500  0.02273  0.08059  0.04225
## Balanced Accuracy     0.98156  0.85429  0.90484  0.80307  0.90122  0.88484
##                      Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity           0.51724  0.77523  0.70486  0.60423   0.98086   0.28173
## Specificity           0.98886  0.98367  0.95729  0.96331   0.98435   0.99366
## Pos Pred Value        0.70312  0.75446  0.40845  0.44346   0.65287   0.72078
## Neg Pred Value        0.97571  0.98543  0.98727  0.98051   0.99942   0.95968
## Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
## Detection Rate        0.02510  0.04713  0.02830  0.02789   0.02858   0.01548
## Detection Prevalence  0.03569  0.06247  0.06930  0.06288   0.04378   0.02147
## Balanced Accuracy     0.75305  0.87945  0.83108  0.78377   0.98260   0.63769
##                      Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity            0.27148   0.40244   0.91354   0.75000   0.56250
## Specificity            0.98576   0.99394   0.99326   0.99272   0.95788
## Pos Pred Value         0.44633   0.70213   0.87328   0.70690   0.21485
## Neg Pred Value         0.96969   0.97909   0.99559   0.99414   0.99073
## Prevalence             0.04057   0.03430   0.04838   0.02287   0.02008
## Detection Rate         0.01102   0.01380   0.04420   0.01715   0.01129
## Detection Prevalence   0.02468   0.01966   0.05061   0.02426   0.05257
## Balanced Accuracy      0.62862   0.69819   0.95340   0.87136   0.76019
##                      Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity           0.252033   0.32661  0.078947   0.56358   0.38350
## Specificity           0.942391   0.98671  0.998118   0.96015   0.98464
## Pos Pred Value        0.134490   0.46821  0.617647   0.41756   0.42473
## Neg Pred Value        0.972582   0.97614  0.965677   0.97748   0.98182
## Prevalence            0.034300   0.03458  0.037089   0.04824   0.02872
## Detection Rate        0.008645   0.01129  0.002928   0.02719   0.01102
## Detection Prevalence  0.064278   0.02412  0.004741   0.06511   0.02593
## Balanced Accuracy     0.597212   0.65666  0.538532   0.76187   0.68407
##                      Class: 22 Class: 23
## Sensitivity            0.53558   0.44277
## Specificity            0.98074   0.99269
## Pos Pred Value         0.51812   0.74619
## Neg Pred Value         0.98202   0.97348
## Prevalence             0.03723   0.04629
## Detection Rate         0.01994   0.02050
## Detection Prevalence   0.03848   0.02747
## Balanced Accuracy      0.75816   0.71773

confusionMatrix(data = pred_bigger, reference = as.factor(sign_test$label))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
##         0  331   0   0   0   0   0   0   0  40   0   0  20  42   0   0   0   0
##         1    0 400   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##         2    0   0 268   0   0  16   0   0   0   0   0   0   0   3  21   0   0
##         3    0   4   0 204   0   0   0   0   0   0   0   0   0   0   0   0   0
##         4    0   0   0   0 423   0   0   0   0   0   0  21   0   0   0   0   0
##         5    0   0  21   0   0 231   0   0   0  39   0   0   0  21   0   0   0
##         6    0   0   0   0   0   0 194   3   0   0   0   0   0   0   0   0   0
##         7    0   0   0   0   0   0  56 395   0   0   0   0   0   0   0   0   0
##         8    0   0   0   0   0   0   0   0 180   0   0   0   0   0   0   0  20
##         9    0  21   0   0   0   0   0   0   0 155   0   0   0   0   0   0  10
##         10   0   0   0   0   0   0   0   0   0   0 148   0   0   0   0   0  21
##         11   0   0   0   0   0   0   0  21   0   0   0 183   2   0   0  21   0
##         12   0   0   0   0   0   0  16   0   0   0   0  76 125   0   0   2  19
##         13   0   0   0   0   0   0  20   0   0   0   0   2  39 165   0   0   0
##         14   0   0   0   0   0   0   0   0   0   4   0   0   0   0 289   0   0
##         15   0   0   0   0   0   0  26   0   8   0   0  24  21  36   2 141   0
##         16   0   0   0   0   0   0   0   0   0  39   0   0   0   0   0   0  57
##         17   0   3   0   0  75   0   0   0  14  15   0  68  61   0  35   0  16
##         18   0   0  15   0   0   0  36  17   0   0   1   0   1  21   0   0   0
##         19   0   2   0   0   0   0   0   0   0  20   0   0   0   0   0   0   0
##         20   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##         21   0   2   0   0   0   0   0   0   2  18   0   0   0   0   0   0   0
##         22   0   0   6  41   0   0   0   0  19   0  60   0   0   0   0   0   0
##         23   0   0   0   0   0   0   0   0  25  41   0   0   0   0   0   0   1
##           Reference
## Prediction  17  18  19  20  21  22  23
##         0    0   0   0   0   0   0   0
##         1    0   0   0   0   0   0   0
##         2    0   0   0   0   0   0   0
##         3    0   0  83   0   0   0   4
##         4   21   0   0   0   0   0   0
##         5    0   0   0  20   4   0   0
##         6    0   0   0  29   0   0   0
##         7   20  21   0   0   0   0   0
##         8   27  21   4   1  20   0  22
##         9    0   0  51  61   5   0  22
##         10   0   0   0   0   0   0  21
##         11  25   0   0   0   0   0   0
##         12  12   0   0   0   0   0   0
##         13   0   0   0   0   0   0   0
##         14   0   0   0  18   0  21   0
##         15   1   0   0   1   0   0   0
##         16   0   0  38   1   3   0   4
##         17 140   0   0   5   0   0  20
##         18   0 141   0  27   0   2  21
##         19   0   0  40   0  28   0   0
##         20   0   0  23 117   0   0   0
##         21   0   0   0  40 146  14   0
##         22   0  65   0   5   0 230   0
##         23   0   0  27  21   0   0 218
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6861          
##                  95% CI : (0.6753, 0.6969)
##     No Information Rate : 0.0694          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6718          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity           1.00000  0.92593  0.86452  0.83265  0.84940  0.93522
## Specificity           0.98509  1.00000  0.99417  0.98686  0.99371  0.98484
## Pos Pred Value        0.76443  1.00000  0.87013  0.69153  0.90968  0.68750
## Neg Pred Value        1.00000  0.99527  0.99388  0.99404  0.98882  0.99766
## Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
## Detection Rate        0.04615  0.05577  0.03737  0.02844  0.05898  0.03221
## Detection Prevalence  0.06037  0.05577  0.04294  0.04113  0.06484  0.04685
## Balanced Accuracy     0.99254  0.96296  0.92934  0.90976  0.92155  0.96003
##                      Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity           0.55747  0.90596  0.62500  0.46828   0.70813   0.46447
## Specificity           0.99531  0.98560  0.98329  0.97515   0.99397   0.98982
## Pos Pred Value        0.85841  0.80285  0.61017  0.47692   0.77895   0.72619
## Neg Pred Value        0.97783  0.99386  0.98430  0.97430   0.99126   0.96951
## Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
## Detection Rate        0.02705  0.05508  0.02510  0.02161   0.02064   0.02552
## Detection Prevalence  0.03151  0.06860  0.04113  0.04532   0.02649   0.03514
## Balanced Accuracy     0.77639  0.94578  0.80415  0.72171   0.85105   0.72714
##                      Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity            0.42955   0.67073   0.83285   0.85976  0.395833
## Specificity            0.98183   0.99119   0.99370   0.98302  0.987906
## Pos Pred Value         0.50000   0.73009   0.87048   0.54231  0.401408
## Neg Pred Value         0.97602   0.98834   0.99152   0.99667  0.987624
## Prevalence             0.04057   0.03430   0.04838   0.02287  0.020078
## Detection Rate         0.01743   0.02301   0.04030   0.01966  0.007948
## Detection Prevalence   0.03486   0.03151   0.04629   0.03625  0.019799
## Balanced Accuracy      0.70569   0.83096   0.91328   0.92139  0.691869
##                      Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity            0.56911   0.56855  0.150376   0.33815   0.70874
## Specificity            0.95495   0.97964  0.992760   0.99663   0.98909
## Pos Pred Value         0.30973   0.50000  0.444444   0.83571   0.65766
## Neg Pred Value         0.98423   0.98447  0.968088   0.96743   0.99137
## Prevalence             0.03430   0.03458  0.037089   0.04824   0.02872
## Detection Rate         0.01952   0.01966  0.005577   0.01631   0.02036
## Detection Prevalence   0.06302   0.03932  0.012549   0.01952   0.03095
## Balanced Accuracy      0.76203   0.77409  0.571568   0.66739   0.84891
##                      Class: 22 Class: 23
## Sensitivity            0.86142   0.65663
## Specificity            0.97161   0.98319
## Pos Pred Value         0.53991   0.65465
## Neg Pred Value         0.99452   0.98333
## Prevalence             0.03723   0.04629
## Detection Rate         0.03207   0.03040
## Detection Prevalence   0.05940   0.04643
## Balanced Accuracy      0.91652   0.81991

From the two confusion matrix above, we can conclude, The more hidden layer and neuron, the model may have better performance because more features will be extracted from the data

4.2 Model Tuning

Because both models have not provided a good enough performance (best fit) where model_base tends to be underfitting and model_bigger tends to be overfitting, improvements will be made to model_bigger. Now, let’s try to build model_tuning by defining the following parameters:

the first layer contains 128 nodes, relu activation function, 784 input shape
the second layer contains 64 nodes, relu activation function
the third layer contains 24 nodes, softmax activation function

# wer code here
tensorflow::set_random_seed(8)
model_tuning <- keras_model_sequential(name = "Model-Tuning") %>% 
  
  # input layer + hidden layer 1
  layer_dense(input_shape = input_dim, # ukuran input (jumlah prediktor)
              units = 128, # jumlah node di layer ini 
              activation  = 'relu', # activation function
              name = 'hidden_1') %>%  
  
  # hidden layer 2
  layer_dense(units = 64,
              activation = 'relu',
              name = "hidden_2") %>% 
  
  # output layer
  layer_dense(units = num_class, 
              activation = 'softmax', 
              name='output')

summary(model_tuning)

## Model: "Model-Tuning"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  hidden_1 (Dense)                   (None, 128)                     100480      
##  hidden_2 (Dense)                   (None, 64)                      8256        
##  output (Dense)                     (None, 24)                      1560        
## ================================================================================
## Total params: 110,296
## Trainable params: 110,296
## Non-trainable params: 0
## ________________________________________________________________________________

Then, compile the by setting these parameters:

categorical_crossentropy as the loss function
optimizer_adam as the optimizer with learning rate of 0.001
use accuracy as the evaluation metric

# wer code here
model_tuning %>% compile(
        loss = "categorical_crossentropy",
        optimizer = optimizer_adam(learning_rate = 0.001),
        metric = 'accuracy' 
)

Last, fit model using epoch = 10, batch_size = 150, and set parameter shuffle = F so that the samples in each batch are not taken randomly but sorted (sequence).

# wer code here
history_tuning <- model_tuning %>% fit(x = train_x_array,
                      y = train_y_dummy,
                      epochs = 10,
                      batch_size = 150,
                      shuffle = F,
                      verbose = 1)

plot(history_tuning)

After tuning the model, please do the predict test_x_array using model_tuning. Please predict using predict() function and store it as pred_tuning

# wer code here
pred_tuning <- predict(object = model_tuning, x = test_x_array) %>% 
  k_argmax() %>% 
  as.array() %>% 
  as.factor()

Check the model performance using accuracy. We can use confusionMatrix() function from caret package. Also do the explicit coercion as.factor if wer data is not yet stored as factor.

# wer code here
confusionMatrix(data = pred_tuning, reference = as.factor(sign_test$label))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
##         0  312   0   0   0   0   0   0   0   0   0   0   2  48   0   0  10   0
##         1    0 370   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##         2    0   0 267   0   0   1   0   0   0   0   0   0   0  14   0   0   0
##         3    0   4   0 173   0   0   7   0   0   0   0   0   0   0   0   0   0
##         4    0   0   0   0 414   0   0   0   0   0   0   5  19  10   0   0   0
##         5    0   0  21   0   0 214   0   0   0  25   0   0   0  18   0   9   0
##         6    0   0   0   0   0   0 241  31   0   0   0   0  42  12   0   0   0
##         7    0   0   0   0   0   0  36 404   0   0   0   0   0  19   0   0   0
##         8    0   0   0   0   0   0   2   0 171   0   0   0   0   0   0   0   0
##         9    0  37   0   0   0   2   0   0   0 190   0   0   0   0   0   0   6
##         10   0   0   1   0   0   0   0   0   7   0 206   0   0   0   0   0  21
##         11   0   0   0   0  19   0   0   0   0   0   0 232  21   0   0  11   0
##         12  19   0   0   0   0   0   0   0   0   0   0  24  95   0   0   0   0
##         13   0   0   0   0   0   0   0   0   0   0   0  21   0 144   0   0   0
##         14   0   0   0   0   0   0  24   0   0   0   0   0   0   0 338   0   0
##         15   0   0   0   0   0   0  19   0   0   0   0  16  21  21   9 134   0
##         16   0   0   0  30   0   0  11   0  50  58   0   0   0   0   0   0  97
##         17   0   0   0   0  65   0   0   0  14   0   0  94  45   0   0   0   0
##         18   0   0   1   0   0  19   8   1   0   0   0   0   0   8   0   0   0
##         19   0  20   0   0   0   0   0   0   0   0   0   0   0   0   0   0  20
##         20   0   0   0  15   0   0   0   0   0   0   0   0   0   0   0   0   0
##         21   0   1   0   0   0  11   0   0   0   0   0   0   0   0   0   0   0
##         22   0   0  20  27   0   0   0   0   4   0   3   0   0   0   0   0   0
##         23   0   0   0   0   0   0   0   0  42  58   0   0   0   0   0   0   0
##           Reference
## Prediction  17  18  19  20  21  22  23
##         0    2   0   0   0   0   0   0
##         1    0   0   0   0   0   0   0
##         2    0   0   0   0   0   0   0
##         3    0   0   6   0   0   0   0
##         4   41   0   0   0   0   0   0
##         5    0   0   0  20   0   0   0
##         6    0   0   0   0   0   0   0
##         7   20  20   0   0   0   0   0
##         8   42  20   0   0   0   0  21
##         9    0   0  96  66  16   0  20
##         10   0  20   2   0   0   0  21
##         11  11   0   0   0   0   0   0
##         12  26   0   0   0   0   0   0
##         13   0   0   0   0   0   0   0
##         14   0   6   0   0   0   1   0
##         15   1   0   0   0   0   0   0
##         16   0  21  43  50  29  16  59
##         17 103   0   0   0   0  19   0
##         18   0 131   0   0   0   2  21
##         19   0   3  89   0  19   0   0
##         20   0   0  30 159   0   0   0
##         21   0   0   0  40 142  43   0
##         22   0  26   0  11   0 186   0
##         23   0   1   0   0   0   0 190
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6974          
##                  95% CI : (0.6867, 0.7081)
##     No Information Rate : 0.0694          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6837          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity           0.94260  0.85648  0.86129  0.70612  0.83133  0.86640
## Specificity           0.99094  1.00000  0.99781  0.99755  0.98876  0.98657
## Pos Pred Value        0.83422  1.00000  0.94681  0.91053  0.84663  0.69707
## Neg Pred Value        0.99721  0.99089  0.99376  0.98969  0.98743  0.99519
## Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
## Detection Rate        0.04350  0.05159  0.03723  0.02412  0.05772  0.02984
## Detection Prevalence  0.05215  0.05159  0.03932  0.02649  0.06818  0.04281
## Balanced Accuracy     0.96677  0.92824  0.92955  0.85183  0.91004  0.92648
##                      Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity           0.69253  0.92661  0.59375  0.57402   0.98565   0.58883
## Specificity           0.98754  0.98590  0.98765  0.96448   0.98966   0.99085
## Pos Pred Value        0.73926  0.80962  0.66797  0.43880   0.74101   0.78912
## Neg Pred Value        0.98437  0.99520  0.98308  0.97908   0.99956   0.97645
## Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
## Detection Rate        0.03360  0.05633  0.02384  0.02649   0.02872   0.03235
## Detection Prevalence  0.04545  0.06958  0.03569  0.06037   0.03876   0.04099
## Balanced Accuracy     0.84004  0.95625  0.79070  0.76925   0.98765   0.78984
##                      Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity            0.32646   0.58537   0.97406   0.81707   0.67361
## Specificity            0.98997   0.99697   0.99546   0.98759   0.94778
## Pos Pred Value         0.57927   0.87273   0.91599   0.60633   0.20905
## Neg Pred Value         0.97203   0.98544   0.99868   0.99568   0.99299
## Prevalence             0.04057   0.03430   0.04838   0.02287   0.02008
## Detection Rate         0.01325   0.02008   0.04713   0.01868   0.01352
## Detection Prevalence   0.02287   0.02301   0.05145   0.03081   0.06470
## Balanced Accuracy      0.65822   0.79117   0.98476   0.90233   0.81070
##                      Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity            0.41870   0.52823   0.33459   0.45954   0.68932
## Specificity            0.96578   0.99133   0.99102   0.99341   0.98636
## Pos Pred Value         0.30294   0.68586   0.58940   0.77941   0.59916
## Neg Pred Value         0.97907   0.98324   0.97479   0.97316   0.99077
## Prevalence             0.03430   0.03458   0.03709   0.04824   0.02872
## Detection Rate         0.01436   0.01827   0.01241   0.02217   0.01980
## Detection Prevalence   0.04741   0.02663   0.02105   0.02844   0.03305
## Balanced Accuracy      0.69224   0.75978   0.66280   0.72647   0.83784
##                      Class: 22 Class: 23
## Sensitivity            0.69663   0.57229
## Specificity            0.98682   0.98523
## Pos Pred Value         0.67148   0.65292
## Neg Pred Value         0.98825   0.97936
## Prevalence             0.03723   0.04629
## Detection Rate         0.02593   0.02649
## Detection Prevalence   0.03862   0.04057
## Balanced Accuracy      0.84173   0.77876

The hidden layer is the place where information from the data is extracted. What can we conclude from model_bigger and model_tuning about hidden layers, The more number of hidden layers used (deep), the neural network model tends to be overfitting

Note: Consider the following criteria for this case

The model is considered quite good if the accuracy reaches >= 70%
The model is considered poor if the accuracy is below 70%
Model performance is considered balanced in both train and test data if the difference in accuracy is <= 20%

From the three models above (model_base, model_bigger and model_tuning), The best model for us to pick is model_tuning, because the accuracy is quite high and the difference in accuracy between train data and test data is the smallest

Note: for this case, we consider a model that has a high enough accuracy is the model that obtains an accuracy above 65% both on the train data and on the test data.

We have completed the task to build a deep learning model to classify sign language images. This model will be very helpful for person with hearing impairment/loss that communicates with sign language, so that they can communicate with common people. This project can be developed further into a sign language-based communication app.

Sign Language MNIST - Neural Network

Nur Imam Masri

August 1, 2023

1 Introduction

2 1 Data Preparation

3 2 Build Neural Network Model

4 3 Predicting on the testing set

5 4 Evaluating the neural network model