Introduction
In this project, we will perform image classification using deep
learning - neural network with keras. We will use the
dataset from Kaggle.
The dataset format is MNIST data which contains sign language image in
csv form. The objective in this project is to classify each sign
language image into the correct label.
First, we import the library.
library(dplyr)
library(neuralnet)
library(keras)
library(caret)
library(rsample)Data Preparation
Load Data
sign_train <- read.csv("sign_mnist_train.csv")
sign_test <- read.csv("sign_mnist_test.csv")Check the dimension of data:
dim(sign_train)## [1] 27455 785
The data have 27,455 rows and 785 columns.
Check any missing value:
anyNA(sign_train)## [1] FALSE
anyNA(sign_test)## [1] FALSE
No missing value found.
Data Pre-Processing
Before built a model with Keras, there are a few things
that need to be prepared as follows:
- Separate data between label (y) and predictors (x).
- Convert to matrix form –>
as.matrix(). - For the train_x and test_x data, the scaling is divided by 255.
- Reshaping the array using
array_reshape(x,dim). - One hot encoding for train_y and test_y data using
to_categorical(data, num_classes)
Check the range of predictor variables:
range(sign_train$pixel1)## [1] 0 255
Check the target variable:
table(sign_train$label)##
## 0 1 2 3 4 5 6 7 8 10 11 12 13 14 15 16
## 1126 1010 1144 1196 957 1204 1090 1013 1162 1114 1241 1055 1151 1196 1088 1279
## 17 18 19 20 21 22 23 24
## 1294 1199 1186 1161 1082 1225 1164 1118
table(sign_test$label)##
## 0 1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 18 19 20
## 331 432 310 245 498 247 348 436 288 331 209 394 291 246 347 164 144 246 248 266
## 21 22 23 24
## 346 206 267 332
From the result above, there is no label number 9. So we need to calculate -1 to all of labels that more than 9.
sign_train <- sign_train %>%
mutate(label = ifelse(label > 9, label-1, label))
sign_test <- sign_test %>%
mutate(label = ifelse(label > 9, label-1, label))
table(sign_train$label)##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 1126 1010 1144 1196 957 1204 1090 1013 1162 1114 1241 1055 1151 1196 1088 1279
## 16 17 18 19 20 21 22 23
## 1294 1199 1186 1161 1082 1225 1164 1118
The data contains pixel values stored in the data frame. We must
separate the predictor and target variable of sign_train
and sign_test, and store them into train_x,
train_y, test_x, and test_y.
After that, the train_x, train_y,
test_x, and test_y must be converted into a
matrix using the data.matrix(). Especially for the
predictor variables stored in train_x and
test_x, we perform features scaling by dividing by the
range.
# Predictor variables in `sign_train`
train_x <- sign_train %>%
select(-label) %>%
as.matrix()/255
# Predictor variables in `sign_test`
test_x <- sign_test %>%
select(-label) %>%
as.matrix()/255
# Target variable in `sign_train`
train_y <- sign_train$label
# Target variable in `sign_test`
test_y <- sign_test$labelNext, we have to convert the predictor matrix into an array form. We
can use the array_reshape(data, dim(data)) to convert the
predictor matrix into an array.
# Predictor variables in `train_x`
train_x_array <- train_x %>%
array_reshape(dim = dim(train_x))
# Predictor variables in `test_x`
test_x_array <- test_x %>%
array_reshape(dim = dim(test_x))One hot encoding of the target variable in the train data
train_y. We can use the to_categorical(), then
save it as train_y_dummy and test_y_dummy
object.
# Target variable in `train_y`
train_y_dummy <- train_y %>%
as.matrix() %>%
to_categorical()# Target variable in `test_y`
test_y_dummy <- test_y %>%
as.matrix() %>%
to_categorical()Build Model
The next step is to build a Neural Network architecture. Some conditions must fulfilled when building a Neural Network architecture as follows:
- Always start with
keras_model_sequential(). - The first layer created will be the first hidden layer.
- The input layer is created by entering the
input_shapeparameter in the first layer. - The last layer created will be the output layer.
First we create an object to store information, the number of columns of the predictor variables and the number of categories of the target variable.
# your code here
input_dim <- ncol(train_x)
num_class <- n_distinct(sign_train$label)RNGkind(sample.kind = "Rounding")
set.seed(100)
initializer <- initializer_random_normal(seed = 100)We will build a Neural Network model with this following conditions:
- Input layer: 784 predictors (image 28x28 pixel).
- Hidden Layer 1: 64 neuron with activation function = ReLu.
- Hidden Layer 2: 32 neuron with activation function = ReLu.
- Output Layer: 24 neuron (according to the number of categories) with activation function = softmax.
# your code here
model_nn <- keras_model_sequential(name="model_nn") %>%
# input layer + first hidden layer
layer_dense(units = 64,
input_shape = input_dim,
activation = "relu",
name = "hidden_1") %>%
# second hidden layer
layer_dense(units = 32,
activation = "relu",
name = "hidden_2") %>%
# output layer
layer_dense(units = num_class,
activation = "softmax",
name = "ouput")
model_nn## Model: "model_nn"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## hidden_1 (Dense) (None, 64) 50240
## hidden_2 (Dense) (None, 32) 2080
## ouput (Dense) (None, 24) 792
## ================================================================================
## Total params: 53,112
## Trainable params: 53,112
## Non-trainable params: 0
## ________________________________________________________________________________
Model Compile
The next step is to determine the error function, optimizer, and metrics.
Error/Loss Function:
- Regression: Sum of Squared Error (SSE), Mean Squared Erro (MSE), Mean Absolute Percentage Error (MAPE)
- Classification 2 class: Binary Cross-Entropy
- Classification more than 2 class: Categorical Cross-Entropy
Optimizer:
- SGD: Stochastic Gradient Descent
- ADAM: Adam Optimizer
- lr: learning rate of optimizer
model_nn %>%
compile(loss= "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.001),
metrics = "accuracy")Model Fitting
Fitting model using epoch = 10,
batch_size = 150, and shuffle = F.
history <- model_nn %>%
fit(x = train_x_array,
y = train_y_dummy,
epochs = 10,
validation_data = list(test_x_array, test_y_dummy),
shuffle = F,
verbose = T,
batch_size = 150
)
plot(history)Prediction
To evaluate the model’s performance , we will predict the test data
test_x_array using the trained model.
# your code here
pred <- predict(model_nn, test_x_array) %>%
k_argmax() %>%
as.array() %>%
as.factor()Model Evaluation
Model evaluation using confusionMatrix.
confusionMatrix(data = pred, reference = as.factor(sign_test$label))## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 0 263 0 0 0 0 0 0 0 8 0 0 1 24 0 0 0 0
## 1 0 328 0 19 37 0 0 1 0 0 0 0 0 0 0 0 0
## 2 0 0 240 0 0 1 0 0 0 0 0 0 0 14 0 0 0
## 3 0 0 0 169 0 1 15 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 301 0 0 0 0 0 0 0 1 3 0 0 0
## 5 0 0 20 0 2 138 0 0 0 0 0 0 0 32 0 0 0
## 6 0 0 0 0 0 0 184 20 0 0 0 0 0 20 0 0 0
## 7 0 0 0 0 0 0 50 366 22 0 0 13 0 0 10 0 0
## 8 0 0 0 0 0 0 0 19 131 1 0 2 0 0 0 0 0
## 9 0 62 0 0 0 0 0 0 0 117 0 0 0 0 0 0 0
## 10 0 0 0 20 0 22 11 0 0 21 209 0 0 0 0 0 21
## 11 0 0 0 0 52 0 0 0 0 0 0 126 45 0 0 17 0
## 12 0 0 0 0 10 0 0 0 0 0 0 67 98 0 0 2 0
## 13 39 0 1 0 0 0 0 19 0 0 0 31 23 127 0 0 0
## 14 0 0 0 0 0 0 0 0 0 7 0 0 0 0 273 0 0
## 15 0 0 0 0 0 0 38 4 41 0 0 47 38 28 9 143 0
## 16 0 14 0 17 0 3 0 0 61 130 0 0 0 0 0 0 103
## 17 29 0 0 0 96 0 0 0 0 0 0 107 40 0 0 2 0
## 18 0 0 23 0 0 39 47 0 8 0 0 0 19 7 0 0 0
## 19 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 3
## 20 0 0 0 0 0 16 0 0 2 1 0 0 0 0 0 0 17
## 21 0 28 0 0 0 6 0 0 0 14 0 0 0 0 33 0 0
## 22 0 0 26 20 0 21 3 0 0 0 0 0 3 15 22 0 0
## 23 0 0 0 0 0 0 0 0 15 40 0 0 0 0 0 0 0
## Reference
## Prediction 17 18 19 20 21 22 23
## 0 0 0 0 0 0 0 0
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 18 5 20 0 0 0 22
## 4 22 0 0 0 0 0 0
## 5 0 0 0 2 0 0 0
## 6 0 5 0 0 0 0 0
## 7 20 19 0 0 21 0 0
## 8 12 2 0 0 0 0 21
## 9 0 0 37 16 62 0 0
## 10 0 83 40 40 0 21 62
## 11 64 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0
## 14 0 0 0 0 0 5 0
## 15 32 1 0 1 0 5 0
## 16 0 0 131 75 59 0 39
## 17 54 0 0 0 0 0 0
## 18 0 74 0 0 0 0 21
## 19 0 20 20 0 1 0 0
## 20 0 0 0 160 42 41 0
## 21 0 0 0 0 21 36 0
## 22 0 39 0 52 0 159 0
## 23 24 0 18 0 0 0 167
##
## Overall Statistics
##
## Accuracy : 0.5537
## 95% CI : (0.5421, 0.5652)
## No Information Rate : 0.0694
## P-Value [Acc > NIR] : < 0.00000000000000022
##
## Kappa : 0.5344
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.79456 0.75926 0.77419 0.68980 0.60442 0.55870
## Specificity 0.99518 0.99154 0.99781 0.98831 0.99610 0.99191
## Pos Pred Value 0.88851 0.85195 0.94118 0.67600 0.92049 0.71134
## Neg Pred Value 0.99011 0.98468 0.98988 0.98902 0.97122 0.98438
## Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
## Detection Rate 0.03667 0.04573 0.03346 0.02356 0.04197 0.01924
## Detection Prevalence 0.04127 0.05368 0.03555 0.03486 0.04559 0.02705
## Balanced Accuracy 0.89487 0.87540 0.88600 0.83905 0.80026 0.77531
## Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity 0.52874 0.83945 0.45486 0.35347 1.00000 0.31980
## Specificity 0.99341 0.97699 0.99172 0.97413 0.95103 0.97374
## Pos Pred Value 0.80349 0.70250 0.69681 0.39796 0.38000 0.41447
## Neg Pred Value 0.97638 0.98948 0.97752 0.96889 1.00000 0.96098
## Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
## Detection Rate 0.02566 0.05103 0.01827 0.01631 0.02914 0.01757
## Detection Prevalence 0.03193 0.07264 0.02621 0.04099 0.07669 0.04239
## Balanced Accuracy 0.76107 0.90822 0.72329 0.66380 0.97551 0.64677
## Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity 0.33677 0.51626 0.78674 0.87195 0.71528
## Specificity 0.98852 0.98368 0.99824 0.96518 0.92473
## Pos Pred Value 0.55367 0.52917 0.95789 0.36951 0.16297
## Neg Pred Value 0.97241 0.98283 0.98926 0.99690 0.99373
## Prevalence 0.04057 0.03430 0.04838 0.02287 0.02008
## Detection Rate 0.01366 0.01771 0.03806 0.01994 0.01436
## Detection Prevalence 0.02468 0.03346 0.03974 0.05396 0.08812
## Balanced Accuracy 0.66264 0.74997 0.89249 0.91857 0.82000
## Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity 0.219512 0.29839 0.075188 0.46243 0.101942
## Specificity 0.960439 0.97631 0.995511 0.98257 0.983204
## Pos Pred Value 0.164634 0.31092 0.392157 0.57348 0.152174
## Neg Pred Value 0.971946 0.97491 0.965454 0.97302 0.973699
## Prevalence 0.034300 0.03458 0.037089 0.04824 0.028723
## Detection Rate 0.007529 0.01032 0.002789 0.02231 0.002928
## Detection Prevalence 0.045733 0.03318 0.007111 0.03890 0.019241
## Balanced Accuracy 0.589976 0.63735 0.535350 0.72250 0.542573
## Class: 22 Class: 23
## Sensitivity 0.59551 0.50301
## Specificity 0.97089 0.98582
## Pos Pred Value 0.44167 0.63258
## Neg Pred Value 0.98415 0.97611
## Prevalence 0.03723 0.04629
## Detection Rate 0.02217 0.02328
## Detection Prevalence 0.05020 0.03681
## Balanced Accuracy 0.78320 0.74442
The accuracy of model_nn < 60%. We will try to
tunning the model by adding the number of hidden layers and nodes.
Model Tunning
We will build a tunning model with this following conditions:
- Input layer: 784 predictor (image 28x28 pixel).
- Hidden Layer 1: 512 neuron with activation function = ReLu.
- Hidden Layer 2: 256 neuron with activation function = ReLu.
- Hidden Layer 3: 128 neuron with activation function = ReLu.
- Hidden Layer 4: 64 neuron with activation function = ReLu.
- Output Layer: 24 neuron (according to the number of categories) with activation function = softmax.
model_tunning <- keras_model_sequential(name="model_tunning") %>%
# input layer + first hidden layer
layer_dense(units = 512,
input_shape = input_dim,
activation = "relu",
name = "hidden_1") %>%
# second hidden layer
layer_dense(units = 256,
activation = "relu",
name = "hidden_2") %>%
# third hidden layer
layer_dense(units = 128,
activation = "relu",
name = "hidden_3") %>%
# fourth hidden layer
layer_dense(units = 64,
activation = "relu",
name = "hidden_4") %>%
# output layer
layer_dense(units = num_class,
activation = "softmax",
name = "ouput")
model_tunning## Model: "model_tunning"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## hidden_1 (Dense) (None, 512) 401920
## hidden_2 (Dense) (None, 256) 131328
## hidden_3 (Dense) (None, 128) 32896
## hidden_4 (Dense) (None, 64) 8256
## ouput (Dense) (None, 24) 1560
## ================================================================================
## Total params: 575,960
## Trainable params: 575,960
## Non-trainable params: 0
## ________________________________________________________________________________
Model Compile
model_tunning %>%
compile(loss= "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.001),
metrics = "accuracy")Model Fitting
history_tunning <- model_tunning %>%
fit(x = train_x_array, #predictor
y = train_y_dummy, #target variabel
epochs = 10,
validation_data = list(test_x_array, test_y_dummy),
shuffle = F,
verbose = T,
batch_size = 150
)
plot(history_tunning)Prediction
# your code here
pred_tunning <- predict(model_tunning, test_x_array) %>%
k_argmax() %>%
as.array() %>%
as.factor()Model Evaluation
confusionMatrix(data = pred_tunning, reference = as.factor(sign_test$label))## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 0 331 0 0 0 0 0 0 21 0 0 0 20 42 0 0 0 0
## 1 0 412 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 310 0 0 6 0 0 0 0 29 0 3 0 18 0 0
## 3 0 0 0 206 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 477 0 0 0 0 0 0 41 21 0 0 0 0
## 5 0 0 0 0 0 241 19 0 2 0 2 0 0 21 0 0 0
## 6 0 0 0 0 0 0 226 20 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 41 395 0 0 0 0 0 20 0 0 0
## 8 0 0 0 1 0 0 20 0 212 0 0 0 0 21 0 0 21
## 9 0 0 0 0 0 0 0 0 0 187 0 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 0 0 0 178 0 0 0 0 0 0
## 11 0 0 0 0 0 0 0 0 0 0 0 260 20 0 0 1 0
## 12 0 0 0 1 0 0 0 0 0 0 0 21 136 0 0 0 0
## 13 0 0 0 0 0 0 0 0 0 0 0 0 39 184 0 0 0
## 14 0 0 0 0 0 0 1 0 0 0 0 0 0 0 327 0 0
## 15 0 0 0 0 0 0 41 0 0 0 0 0 30 0 0 163 0
## 16 0 0 0 0 0 0 0 0 0 59 0 0 0 0 0 0 82
## 17 0 0 0 0 21 0 0 0 12 18 0 52 0 0 0 0 0
## 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 19 0 20 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 21
## 21 0 0 0 0 0 0 0 0 0 23 0 0 0 0 2 0 20
## 22 0 0 0 27 0 0 0 0 22 0 0 0 0 0 0 0 0
## 23 0 0 0 0 0 0 0 0 40 23 0 0 0 0 0 0 0
## Reference
## Prediction 17 18 19 20 21 22 23
## 0 0 0 0 0 0 0 0
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 20 0 0 0 0
## 4 42 0 0 0 0 0 0
## 5 0 20 0 27 12 0 0
## 6 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 0
## 8 18 21 0 1 0 0 26
## 9 0 0 66 23 20 0 0
## 10 0 0 0 0 0 14 21
## 11 41 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0
## 14 0 0 0 8 0 0 0
## 15 5 0 0 0 0 0 0
## 16 0 0 42 0 0 0 10
## 17 134 0 0 0 0 0 0
## 18 0 145 0 1 0 0 21
## 19 0 0 68 7 9 0 0
## 20 0 0 27 197 0 2 0
## 21 0 0 1 63 165 18 14
## 22 0 62 0 0 0 233 0
## 23 6 0 42 19 0 0 240
##
## Overall Statistics
##
## Accuracy : 0.7681
## 95% CI : (0.7582, 0.7779)
## No Information Rate : 0.0694
## P-Value [Acc > NIR] : < 0.00000000000000022
##
## Kappa : 0.7573
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 1.00000 0.95370 1.00000 0.84082 0.95783 0.97571
## Specificity 0.98787 0.99852 0.99184 0.99711 0.98442 0.98513
## Pos Pred Value 0.79952 0.97630 0.84699 0.91150 0.82100 0.70058
## Neg Pred Value 1.00000 0.99704 1.00000 0.99439 0.99681 0.99912
## Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
## Detection Rate 0.04615 0.05745 0.04322 0.02872 0.06651 0.03360
## Detection Prevalence 0.05772 0.05884 0.05103 0.03151 0.08101 0.04796
## Balanced Accuracy 0.99393 0.97611 0.99592 0.91896 0.97112 0.98042
## Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity 0.64943 0.90596 0.73611 0.56495 0.85167 0.65990
## Specificity 0.99707 0.99094 0.98126 0.98407 0.99497 0.99085
## Pos Pred Value 0.91870 0.86623 0.62170 0.63176 0.83568 0.80745
## Neg Pred Value 0.98239 0.99390 0.98887 0.97906 0.99555 0.98044
## Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
## Detection Rate 0.03151 0.05508 0.02956 0.02607 0.02482 0.03625
## Detection Prevalence 0.03430 0.06358 0.04755 0.04127 0.02970 0.04490
## Balanced Accuracy 0.82325 0.94845 0.85869 0.77451 0.92332 0.82538
## Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity 0.46735 0.74797 0.94236 0.99390 0.56944
## Specificity 0.99680 0.99437 0.99868 0.98916 0.98421
## Pos Pred Value 0.86076 0.82511 0.97321 0.68201 0.42487
## Neg Pred Value 0.97790 0.99108 0.99707 0.99986 0.99112
## Prevalence 0.04057 0.03430 0.04838 0.02287 0.02008
## Detection Rate 0.01896 0.02566 0.04559 0.02273 0.01143
## Detection Prevalence 0.02203 0.03109 0.04685 0.03332 0.02691
## Balanced Accuracy 0.73208 0.87117 0.97052 0.99153 0.77683
## Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity 0.54472 0.58468 0.255639 0.56936 0.80097
## Specificity 0.98513 0.99682 0.991891 0.99253 0.97976
## Pos Pred Value 0.56540 0.86826 0.548387 0.79435 0.53922
## Neg Pred Value 0.98385 0.98530 0.971907 0.97848 0.99403
## Prevalence 0.03430 0.03458 0.037089 0.04824 0.02872
## Detection Rate 0.01868 0.02022 0.009481 0.02747 0.02301
## Detection Prevalence 0.03305 0.02328 0.017289 0.03458 0.04267
## Balanced Accuracy 0.76492 0.79075 0.623765 0.78095 0.89036
## Class: 22 Class: 23
## Sensitivity 0.87266 0.72289
## Specificity 0.98392 0.98099
## Pos Pred Value 0.67733 0.64865
## Neg Pred Value 0.99502 0.98647
## Prevalence 0.03723 0.04629
## Detection Rate 0.03249 0.03346
## Detection Prevalence 0.04796 0.05159
## Balanced Accuracy 0.92829 0.85194
The accuracy of model_tunning around 75%.
Conclusion
We can classify each sign language image into the correct label using
Neural Network. Based on the evaluation result, it can be concluded that
model_tunning is the best model based on the accuracy of
the model.