Sign Language MNIST - Neural Network

Safira Widya Putri

2022-07-04

Introduction

In this project, we will perform image classification using deep learning - neural network with keras. We will use the dataset from Kaggle. The dataset format is MNIST data which contains sign language image in csv form. The objective in this project is to classify each sign language image into the correct label.

First, we import the library.

library(dplyr)
library(neuralnet)
library(keras)
library(caret)
library(rsample)

Data Preparation

Load Data

sign_train <- read.csv("sign_mnist_train.csv")
sign_test <- read.csv("sign_mnist_test.csv")

Check the dimension of data:

dim(sign_train)
## [1] 27455   785

The data have 27,455 rows and 785 columns.

Check any missing value:

anyNA(sign_train)
## [1] FALSE
anyNA(sign_test)
## [1] FALSE

No missing value found.

Data Pre-Processing

Before built a model with Keras, there are a few things that need to be prepared as follows:

  1. Separate data between label (y) and predictors (x).
  2. Convert to matrix form –> as.matrix().
  3. For the train_x and test_x data, the scaling is divided by 255.
  4. Reshaping the array using array_reshape(x,dim).
  5. One hot encoding for train_y and test_y data using to_categorical(data, num_classes)

Check the range of predictor variables:

range(sign_train$pixel1)
## [1]   0 255

Check the target variable:

table(sign_train$label)
## 
##    0    1    2    3    4    5    6    7    8   10   11   12   13   14   15   16 
## 1126 1010 1144 1196  957 1204 1090 1013 1162 1114 1241 1055 1151 1196 1088 1279 
##   17   18   19   20   21   22   23   24 
## 1294 1199 1186 1161 1082 1225 1164 1118
table(sign_test$label)
## 
##   0   1   2   3   4   5   6   7   8  10  11  12  13  14  15  16  17  18  19  20 
## 331 432 310 245 498 247 348 436 288 331 209 394 291 246 347 164 144 246 248 266 
##  21  22  23  24 
## 346 206 267 332

From the result above, there is no label number 9. So we need to calculate -1 to all of labels that more than 9.

sign_train <- sign_train %>%
  mutate(label = ifelse(label > 9, label-1, label))

sign_test <- sign_test %>% 
  mutate(label = ifelse(label > 9, label-1, label))

table(sign_train$label)
## 
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 1126 1010 1144 1196  957 1204 1090 1013 1162 1114 1241 1055 1151 1196 1088 1279 
##   16   17   18   19   20   21   22   23 
## 1294 1199 1186 1161 1082 1225 1164 1118

The data contains pixel values stored in the data frame. We must separate the predictor and target variable of sign_train and sign_test, and store them into train_x, train_y, test_x, and test_y. After that, the train_x, train_y, test_x, and test_y must be converted into a matrix using the data.matrix(). Especially for the predictor variables stored in train_x and test_x, we perform features scaling by dividing by the range.

# Predictor variables in `sign_train`
train_x <- sign_train %>%
  select(-label) %>%
  as.matrix()/255

# Predictor variables in `sign_test`
test_x <- sign_test %>% 
  select(-label) %>% 
  as.matrix()/255

# Target variable in `sign_train`
train_y <- sign_train$label

# Target variable in `sign_test`
test_y <- sign_test$label

Next, we have to convert the predictor matrix into an array form. We can use the array_reshape(data, dim(data)) to convert the predictor matrix into an array.

# Predictor variables in `train_x`
train_x_array <- train_x %>% 
  array_reshape(dim = dim(train_x))

# Predictor variables in `test_x`
test_x_array <- test_x %>% 
  array_reshape(dim = dim(test_x))

One hot encoding of the target variable in the train data train_y. We can use the to_categorical(), then save it as train_y_dummy and test_y_dummy object.

# Target variable in `train_y`
train_y_dummy <- train_y %>%
  as.matrix() %>% 
  to_categorical()
# Target variable in `test_y`
test_y_dummy <- test_y %>% 
  as.matrix() %>% 
  to_categorical()

Build Model

The next step is to build a Neural Network architecture. Some conditions must fulfilled when building a Neural Network architecture as follows:

  1. Always start with keras_model_sequential().
  2. The first layer created will be the first hidden layer.
  3. The input layer is created by entering the input_shape parameter in the first layer.
  4. The last layer created will be the output layer.

First we create an object to store information, the number of columns of the predictor variables and the number of categories of the target variable.

# your code here
input_dim <- ncol(train_x)
num_class <- n_distinct(sign_train$label)
RNGkind(sample.kind = "Rounding")
set.seed(100)
initializer <- initializer_random_normal(seed = 100)

We will build a Neural Network model with this following conditions:

  • Input layer: 784 predictors (image 28x28 pixel).
  • Hidden Layer 1: 64 neuron with activation function = ReLu.
  • Hidden Layer 2: 32 neuron with activation function = ReLu.
  • Output Layer: 24 neuron (according to the number of categories) with activation function = softmax.
# your code here
model_nn <- keras_model_sequential(name="model_nn") %>% 
  
  # input layer + first hidden layer
  layer_dense(units = 64,
              input_shape = input_dim,
              activation = "relu",
              name = "hidden_1") %>% 
  
  # second hidden layer
  layer_dense(units = 32,
              activation = "relu",
              name = "hidden_2") %>% 
  
  # output layer
  layer_dense(units = num_class,
              activation = "softmax",
              name = "ouput")
model_nn
## Model: "model_nn"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  hidden_1 (Dense)                   (None, 64)                      50240       
##  hidden_2 (Dense)                   (None, 32)                      2080        
##  ouput (Dense)                      (None, 24)                      792         
## ================================================================================
## Total params: 53,112
## Trainable params: 53,112
## Non-trainable params: 0
## ________________________________________________________________________________

Model Compile

The next step is to determine the error function, optimizer, and metrics.

Error/Loss Function:

  • Regression: Sum of Squared Error (SSE), Mean Squared Erro (MSE), Mean Absolute Percentage Error (MAPE)
  • Classification 2 class: Binary Cross-Entropy
  • Classification more than 2 class: Categorical Cross-Entropy

Optimizer:

  • SGD: Stochastic Gradient Descent
  • ADAM: Adam Optimizer
  • lr: learning rate of optimizer
model_nn %>% 
  compile(loss= "categorical_crossentropy",
          optimizer = optimizer_adam(learning_rate = 0.001),
          metrics = "accuracy")

Model Fitting

Fitting model using epoch = 10, batch_size = 150, and shuffle = F.

history <- model_nn %>% 
           fit(x = train_x_array,
               y = train_y_dummy,
               epochs = 10,
               validation_data = list(test_x_array, test_y_dummy),
               shuffle = F, 
               verbose = T, 
               batch_size = 150 
               )

plot(history)

Prediction

To evaluate the model’s performance , we will predict the test data test_x_array using the trained model.

# your code here
pred <- predict(model_nn, test_x_array) %>% 
  k_argmax() %>%
  as.array() %>% 
  as.factor()

Model Evaluation

Model evaluation using confusionMatrix.

confusionMatrix(data = pred, reference = as.factor(sign_test$label))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
##         0  263   0   0   0   0   0   0   0   8   0   0   1  24   0   0   0   0
##         1    0 328   0  19  37   0   0   1   0   0   0   0   0   0   0   0   0
##         2    0   0 240   0   0   1   0   0   0   0   0   0   0  14   0   0   0
##         3    0   0   0 169   0   1  15   0   0   0   0   0   0   0   0   0   0
##         4    0   0   0   0 301   0   0   0   0   0   0   0   1   3   0   0   0
##         5    0   0  20   0   2 138   0   0   0   0   0   0   0  32   0   0   0
##         6    0   0   0   0   0   0 184  20   0   0   0   0   0  20   0   0   0
##         7    0   0   0   0   0   0  50 366  22   0   0  13   0   0  10   0   0
##         8    0   0   0   0   0   0   0  19 131   1   0   2   0   0   0   0   0
##         9    0  62   0   0   0   0   0   0   0 117   0   0   0   0   0   0   0
##         10   0   0   0  20   0  22  11   0   0  21 209   0   0   0   0   0  21
##         11   0   0   0   0  52   0   0   0   0   0   0 126  45   0   0  17   0
##         12   0   0   0   0  10   0   0   0   0   0   0  67  98   0   0   2   0
##         13  39   0   1   0   0   0   0  19   0   0   0  31  23 127   0   0   0
##         14   0   0   0   0   0   0   0   0   0   7   0   0   0   0 273   0   0
##         15   0   0   0   0   0   0  38   4  41   0   0  47  38  28   9 143   0
##         16   0  14   0  17   0   3   0   0  61 130   0   0   0   0   0   0 103
##         17  29   0   0   0  96   0   0   0   0   0   0 107  40   0   0   2   0
##         18   0   0  23   0   0  39  47   0   8   0   0   0  19   7   0   0   0
##         19   0   0   0   0   0   0   0   7   0   0   0   0   0   0   0   0   3
##         20   0   0   0   0   0  16   0   0   2   1   0   0   0   0   0   0  17
##         21   0  28   0   0   0   6   0   0   0  14   0   0   0   0  33   0   0
##         22   0   0  26  20   0  21   3   0   0   0   0   0   3  15  22   0   0
##         23   0   0   0   0   0   0   0   0  15  40   0   0   0   0   0   0   0
##           Reference
## Prediction  17  18  19  20  21  22  23
##         0    0   0   0   0   0   0   0
##         1    0   0   0   0   0   0   0
##         2    0   0   0   0   0   0   0
##         3   18   5  20   0   0   0  22
##         4   22   0   0   0   0   0   0
##         5    0   0   0   2   0   0   0
##         6    0   5   0   0   0   0   0
##         7   20  19   0   0  21   0   0
##         8   12   2   0   0   0   0  21
##         9    0   0  37  16  62   0   0
##         10   0  83  40  40   0  21  62
##         11  64   0   0   0   0   0   0
##         12   0   0   0   0   0   0   0
##         13   0   0   0   0   0   0   0
##         14   0   0   0   0   0   5   0
##         15  32   1   0   1   0   5   0
##         16   0   0 131  75  59   0  39
##         17  54   0   0   0   0   0   0
##         18   0  74   0   0   0   0  21
##         19   0  20  20   0   1   0   0
##         20   0   0   0 160  42  41   0
##         21   0   0   0   0  21  36   0
##         22   0  39   0  52   0 159   0
##         23  24   0  18   0   0   0 167
## 
## Overall Statistics
##                                                
##                Accuracy : 0.5537               
##                  95% CI : (0.5421, 0.5652)     
##     No Information Rate : 0.0694               
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.5344               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity           0.79456  0.75926  0.77419  0.68980  0.60442  0.55870
## Specificity           0.99518  0.99154  0.99781  0.98831  0.99610  0.99191
## Pos Pred Value        0.88851  0.85195  0.94118  0.67600  0.92049  0.71134
## Neg Pred Value        0.99011  0.98468  0.98988  0.98902  0.97122  0.98438
## Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
## Detection Rate        0.03667  0.04573  0.03346  0.02356  0.04197  0.01924
## Detection Prevalence  0.04127  0.05368  0.03555  0.03486  0.04559  0.02705
## Balanced Accuracy     0.89487  0.87540  0.88600  0.83905  0.80026  0.77531
##                      Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity           0.52874  0.83945  0.45486  0.35347   1.00000   0.31980
## Specificity           0.99341  0.97699  0.99172  0.97413   0.95103   0.97374
## Pos Pred Value        0.80349  0.70250  0.69681  0.39796   0.38000   0.41447
## Neg Pred Value        0.97638  0.98948  0.97752  0.96889   1.00000   0.96098
## Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
## Detection Rate        0.02566  0.05103  0.01827  0.01631   0.02914   0.01757
## Detection Prevalence  0.03193  0.07264  0.02621  0.04099   0.07669   0.04239
## Balanced Accuracy     0.76107  0.90822  0.72329  0.66380   0.97551   0.64677
##                      Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity            0.33677   0.51626   0.78674   0.87195   0.71528
## Specificity            0.98852   0.98368   0.99824   0.96518   0.92473
## Pos Pred Value         0.55367   0.52917   0.95789   0.36951   0.16297
## Neg Pred Value         0.97241   0.98283   0.98926   0.99690   0.99373
## Prevalence             0.04057   0.03430   0.04838   0.02287   0.02008
## Detection Rate         0.01366   0.01771   0.03806   0.01994   0.01436
## Detection Prevalence   0.02468   0.03346   0.03974   0.05396   0.08812
## Balanced Accuracy      0.66264   0.74997   0.89249   0.91857   0.82000
##                      Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity           0.219512   0.29839  0.075188   0.46243  0.101942
## Specificity           0.960439   0.97631  0.995511   0.98257  0.983204
## Pos Pred Value        0.164634   0.31092  0.392157   0.57348  0.152174
## Neg Pred Value        0.971946   0.97491  0.965454   0.97302  0.973699
## Prevalence            0.034300   0.03458  0.037089   0.04824  0.028723
## Detection Rate        0.007529   0.01032  0.002789   0.02231  0.002928
## Detection Prevalence  0.045733   0.03318  0.007111   0.03890  0.019241
## Balanced Accuracy     0.589976   0.63735  0.535350   0.72250  0.542573
##                      Class: 22 Class: 23
## Sensitivity            0.59551   0.50301
## Specificity            0.97089   0.98582
## Pos Pred Value         0.44167   0.63258
## Neg Pred Value         0.98415   0.97611
## Prevalence             0.03723   0.04629
## Detection Rate         0.02217   0.02328
## Detection Prevalence   0.05020   0.03681
## Balanced Accuracy      0.78320   0.74442

The accuracy of model_nn < 60%. We will try to tunning the model by adding the number of hidden layers and nodes.

Model Tunning

We will build a tunning model with this following conditions:

  • Input layer: 784 predictor (image 28x28 pixel).
  • Hidden Layer 1: 512 neuron with activation function = ReLu.
  • Hidden Layer 2: 256 neuron with activation function = ReLu.
  • Hidden Layer 3: 128 neuron with activation function = ReLu.
  • Hidden Layer 4: 64 neuron with activation function = ReLu.
  • Output Layer: 24 neuron (according to the number of categories) with activation function = softmax.
model_tunning <- keras_model_sequential(name="model_tunning") %>% 
  
  # input layer + first hidden layer
  layer_dense(units = 512,
              input_shape = input_dim,
              activation = "relu",
              name = "hidden_1") %>% 
  
  # second hidden layer
  layer_dense(units = 256,
              activation = "relu",
              name = "hidden_2") %>% 
  
  # third hidden layer
  layer_dense(units = 128,
              activation = "relu",
              name = "hidden_3") %>% 
  
  # fourth hidden layer
  layer_dense(units = 64,
              activation = "relu",
              name = "hidden_4") %>% 
  
  # output layer
  layer_dense(units = num_class,
              activation = "softmax",
              name = "ouput")
model_tunning
## Model: "model_tunning"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  hidden_1 (Dense)                   (None, 512)                     401920      
##  hidden_2 (Dense)                   (None, 256)                     131328      
##  hidden_3 (Dense)                   (None, 128)                     32896       
##  hidden_4 (Dense)                   (None, 64)                      8256        
##  ouput (Dense)                      (None, 24)                      1560        
## ================================================================================
## Total params: 575,960
## Trainable params: 575,960
## Non-trainable params: 0
## ________________________________________________________________________________

Model Compile

model_tunning %>% 
  compile(loss= "categorical_crossentropy",
          optimizer = optimizer_adam(learning_rate = 0.001),
          metrics = "accuracy")

Model Fitting

history_tunning <- model_tunning %>% 
           fit(x = train_x_array, #predictor
               y = train_y_dummy, #target variabel
               epochs = 10,
               validation_data = list(test_x_array, test_y_dummy),
               shuffle = F, 
               verbose = T, 
               batch_size = 150 
               )

plot(history_tunning)

Prediction

# your code here
pred_tunning <- predict(model_tunning, test_x_array) %>% 
  k_argmax() %>%
  as.array() %>% 
  as.factor()

Model Evaluation

confusionMatrix(data = pred_tunning, reference = as.factor(sign_test$label))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
##         0  331   0   0   0   0   0   0  21   0   0   0  20  42   0   0   0   0
##         1    0 412   0  10   0   0   0   0   0   0   0   0   0   0   0   0   0
##         2    0   0 310   0   0   6   0   0   0   0  29   0   3   0  18   0   0
##         3    0   0   0 206   0   0   0   0   0   0   0   0   0   0   0   0   0
##         4    0   0   0   0 477   0   0   0   0   0   0  41  21   0   0   0   0
##         5    0   0   0   0   0 241  19   0   2   0   2   0   0  21   0   0   0
##         6    0   0   0   0   0   0 226  20   0   0   0   0   0   0   0   0   0
##         7    0   0   0   0   0   0  41 395   0   0   0   0   0  20   0   0   0
##         8    0   0   0   1   0   0  20   0 212   0   0   0   0  21   0   0  21
##         9    0   0   0   0   0   0   0   0   0 187   0   0   0   0   0   0   0
##         10   0   0   0   0   0   0   0   0   0   0 178   0   0   0   0   0   0
##         11   0   0   0   0   0   0   0   0   0   0   0 260  20   0   0   1   0
##         12   0   0   0   1   0   0   0   0   0   0   0  21 136   0   0   0   0
##         13   0   0   0   0   0   0   0   0   0   0   0   0  39 184   0   0   0
##         14   0   0   0   0   0   0   1   0   0   0   0   0   0   0 327   0   0
##         15   0   0   0   0   0   0  41   0   0   0   0   0  30   0   0 163   0
##         16   0   0   0   0   0   0   0   0   0  59   0   0   0   0   0   0  82
##         17   0   0   0   0  21   0   0   0  12  18   0  52   0   0   0   0   0
##         18   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##         19   0  20   0   0   0   0   0   0   0  20   0   0   0   0   0   0   0
##         20   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  21
##         21   0   0   0   0   0   0   0   0   0  23   0   0   0   0   2   0  20
##         22   0   0   0  27   0   0   0   0  22   0   0   0   0   0   0   0   0
##         23   0   0   0   0   0   0   0   0  40  23   0   0   0   0   0   0   0
##           Reference
## Prediction  17  18  19  20  21  22  23
##         0    0   0   0   0   0   0   0
##         1    0   0   0   0   0   0   0
##         2    0   0   0   0   0   0   0
##         3    0   0  20   0   0   0   0
##         4   42   0   0   0   0   0   0
##         5    0  20   0  27  12   0   0
##         6    0   0   0   0   0   0   0
##         7    0   0   0   0   0   0   0
##         8   18  21   0   1   0   0  26
##         9    0   0  66  23  20   0   0
##         10   0   0   0   0   0  14  21
##         11  41   0   0   0   0   0   0
##         12   0   0   0   0   0   0   0
##         13   0   0   0   0   0   0   0
##         14   0   0   0   8   0   0   0
##         15   5   0   0   0   0   0   0
##         16   0   0  42   0   0   0  10
##         17 134   0   0   0   0   0   0
##         18   0 145   0   1   0   0  21
##         19   0   0  68   7   9   0   0
##         20   0   0  27 197   0   2   0
##         21   0   0   1  63 165  18  14
##         22   0  62   0   0   0 233   0
##         23   6   0  42  19   0   0 240
## 
## Overall Statistics
##                                                
##                Accuracy : 0.7681               
##                  95% CI : (0.7582, 0.7779)     
##     No Information Rate : 0.0694               
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.7573               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity           1.00000  0.95370  1.00000  0.84082  0.95783  0.97571
## Specificity           0.98787  0.99852  0.99184  0.99711  0.98442  0.98513
## Pos Pred Value        0.79952  0.97630  0.84699  0.91150  0.82100  0.70058
## Neg Pred Value        1.00000  0.99704  1.00000  0.99439  0.99681  0.99912
## Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
## Detection Rate        0.04615  0.05745  0.04322  0.02872  0.06651  0.03360
## Detection Prevalence  0.05772  0.05884  0.05103  0.03151  0.08101  0.04796
## Balanced Accuracy     0.99393  0.97611  0.99592  0.91896  0.97112  0.98042
##                      Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity           0.64943  0.90596  0.73611  0.56495   0.85167   0.65990
## Specificity           0.99707  0.99094  0.98126  0.98407   0.99497   0.99085
## Pos Pred Value        0.91870  0.86623  0.62170  0.63176   0.83568   0.80745
## Neg Pred Value        0.98239  0.99390  0.98887  0.97906   0.99555   0.98044
## Prevalence            0.04852  0.06079  0.04016  0.04615   0.02914   0.05494
## Detection Rate        0.03151  0.05508  0.02956  0.02607   0.02482   0.03625
## Detection Prevalence  0.03430  0.06358  0.04755  0.04127   0.02970   0.04490
## Balanced Accuracy     0.82325  0.94845  0.85869  0.77451   0.92332   0.82538
##                      Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity            0.46735   0.74797   0.94236   0.99390   0.56944
## Specificity            0.99680   0.99437   0.99868   0.98916   0.98421
## Pos Pred Value         0.86076   0.82511   0.97321   0.68201   0.42487
## Neg Pred Value         0.97790   0.99108   0.99707   0.99986   0.99112
## Prevalence             0.04057   0.03430   0.04838   0.02287   0.02008
## Detection Rate         0.01896   0.02566   0.04559   0.02273   0.01143
## Detection Prevalence   0.02203   0.03109   0.04685   0.03332   0.02691
## Balanced Accuracy      0.73208   0.87117   0.97052   0.99153   0.77683
##                      Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity            0.54472   0.58468  0.255639   0.56936   0.80097
## Specificity            0.98513   0.99682  0.991891   0.99253   0.97976
## Pos Pred Value         0.56540   0.86826  0.548387   0.79435   0.53922
## Neg Pred Value         0.98385   0.98530  0.971907   0.97848   0.99403
## Prevalence             0.03430   0.03458  0.037089   0.04824   0.02872
## Detection Rate         0.01868   0.02022  0.009481   0.02747   0.02301
## Detection Prevalence   0.03305   0.02328  0.017289   0.03458   0.04267
## Balanced Accuracy      0.76492   0.79075  0.623765   0.78095   0.89036
##                      Class: 22 Class: 23
## Sensitivity            0.87266   0.72289
## Specificity            0.98392   0.98099
## Pos Pred Value         0.67733   0.64865
## Neg Pred Value         0.99502   0.98647
## Prevalence             0.03723   0.04629
## Detection Rate         0.03249   0.03346
## Detection Prevalence   0.04796   0.05159
## Balanced Accuracy      0.92829   0.85194

The accuracy of model_tunning around 75%.

Conclusion

We can classify each sign language image into the correct label using Neural Network. Based on the evaluation result, it can be concluded that model_tunning is the best model based on the accuracy of the model.