We need to build wer classification model to classify the categories
of sign language image using Neural Network algorithm in
keras framework by following these steps:
Let us start our neural network experience by first preparing the
dataset. We will use sign-language-mnist dataset which can
be downloaded on the
following page. Data to download are
sign-mnist-train.csv as train data and
sign-mnist-test.csv as test date. Both of the data stores
sign language images measuring 28 x 28 pixels for 24 different
categories.
1.1 Load the library and data
Please load the following package.
In this phase, please load and check the
sign-mnist-train.csv dan sign-mnist-test.csv
data, then store it as sign_train and
sign_test.
# wer code here
sign_train <- read.csv("datasets/sign_mnist_train/sign_mnist_train.csv")
sign_test <- read.csv("datasets/sign_mnist_test/sign_mnist_test.csv")
# wer code here (cek dimensi)
dim(sign_train)## [1] 27455 785
## [1] 7172 785
Inspect the sign_train data by using head()
function.
The sign_train data consists of 27455 observations and
785 variables (1 target and 784 predictors). Each predictor represent
pixels of the image.
1.2 Fixed category on target variable
Check the category on the target variable in both the
sign_train and sign_test data by using the
unique() function
## [1] 3 6 2 13 16 8 22 18 10 20 17 19 21 23 24 1 12 11 15 4 0 5 7 14
## [1] 6 5 10 0 3 21 14 7 8 12 4 22 2 15 1 13 19 18 23 17 20 16 11 24
We need to fix the categories on the target variable in both the
sign_train and sign_test data. Since labels 9
and 25 are missing, we can subtract by 1 all labels greater than 9. In
this way, our labels become all integers from 0 to 23. We can use the
mutate() and ifelse( ) function to fix the
category on the target variable in both sign_train and
sign_test data.
Use the code below to fix the categories on the target
variable in the sign_train and sign_test
data.
sign_train <- sign_train %>%
mutate(label = ifelse(label > 9, label-1, label))
sign_test <- sign_test %>%
mutate(label = ifelse(label > 9, label-1, label))# fungsi untuk visualisai data gambar dari csv
vizTrain <- function(input){
dimmax <- sqrt(ncol(input[,-1]))
dimn <- ceiling(sqrt(nrow(input)))
par(mfrow=c(dimn, dimn), mar=c(.1, .1, .1, .1))
for (i in 1:nrow(input)){
m1 <- as.matrix(input[i,2:785])
dim(m1) <- c(28,28)
m1 <- apply(apply(m1, 1, rev), 1, t)
image(1:28, 1:28,
m1, col=grey.colors(255),
# remove axis text
xaxt = 'n', yaxt = 'n')
text(2, 20, col="white", cex=1.2, input[i, 1])
}
}
vizTrain(head(sign_train, 36))1.3 Separates predictors and targets, converts data into matrix, and features scaling
The data contains the value of pixels stored in a
data.frame. However, we have to separates predictors
and targets for sign_train and sign_test data
and store it as train_x, train_y,
test_x, dan test_y. We can use
select() function for separates predictors and targets on
sign_train and sign_test data.
After that, convert train_x, train_y,
test_x, dan test_y data into matrix before we
create a model. Please convert the data into matrix format using
data.matrix() function. Especially for predictor variables
stored in train_x and test_x, do features
scaling by dividing with 255.
# Predictor variables in `sign_train`
train_x <- sign_train %>%
select(-label) %>%
as.matrix()/255
# Predictor variables in `sign_test`
test_x <- sign_test %>%
select(-label) %>%
as.matrix()/255
# Target variable in `sign_train`
train_y <- sign_train$label
# Target variable in `sign_test`
test_y <- sign_test$label
range(train_x)## [1] 0 1
## [1] 0 1
If we inspect an image in the training set, we will see that the pixel values fall in the range of 0 to 255. The purpose of dividing the values in the array by 255 is for Normalize the array value from 0 to 255 into 0 to 1
1.4 Converting matrix to array
Next, we have to convert the predictor matrix into an array form. We
can use the array_reshape(data, dim(data)) function to
convert the predictor matrix into an array.
# Predictor variables in `train_x`
train_x_array <- array_reshape(x = train_x, dim = dim(train_x))
# Predictor variables in `test_x`
test_x_array <- array_reshape(x = test_x, dim = dim(test_x))We should also do one-hot encoding to the target variable
(train_y) using to_categorical() function from
keras and stored it as train_y_dummy
object.
2.1 Build a model base using
keras_model_sequential()
To organize the layers, we should create a base model, which is a
sequential model. Call a keras_model_sequential() function,
and please pipe the base model with the model architecture.
2.2 Building Architecture (define layers, neurons, and activation function)
To define the architecture for each layer, we will build several models by tuning several parameters.
First, create a model (stored it as model_base) by
defining the following parameters:
the first layer contains 64 nodes, relu activation function, 784 input shape
the second layer contains 32 nodes, relu activation function
the third layer contains 24 nodes, softmax activation function
But before building the architecture, we set the randomness of our
weight on first epoch with set_random_seed() from
tensorflow. Make sure to run all of this chunk together to make it
works.
# wer code here
tensorflow::set_random_seed(8)
model_base <- keras_model_sequential(name = "Model-Base") %>%
# input layer + hidden layer 1
layer_dense(input_shape = input_dim, # ukuran input (jumlah prediktor)
units = 64, # jumlah node di layer ini
activation = 'relu', # activation function
name = 'hidden_1') %>%
# hidden layer 2
layer_dense(units = 32,
activation = 'relu',
name = "hidden_2") %>%
# output layer
layer_dense(units = num_class,
activation = 'softmax',
name='output')
summary(model_base)## Model: "Model-Base"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## hidden_1 (Dense) (None, 64) 50240
## hidden_2 (Dense) (None, 32) 2080
## output (Dense) (None, 24) 792
## ================================================================================
## Total params: 53,112
## Trainable params: 53,112
## Non-trainable params: 0
## ________________________________________________________________________________
Second, create a model (stored it as model_bigger) by
defining the following parameters:
the first layer contains 256 nodes, relu activation function, 784 input shape
the second layer contains 128 nodes, relu activation function
the third layer contains 64 nodes, relu activation function
the fourth layer contains 24 nodes, softmax activation function
# wer code here
tensorflow::set_random_seed(8)
model_bigger <- keras_model_sequential(name = "Model-Bigger") %>%
# input layer + hidden layer 1
layer_dense(input_shape = input_dim, # ukuran input (jumlah prediktor)
units =256, # jumlah node di layer ini
activation = 'relu', # activation function
name = 'hidden_1') %>%
# hidden layer 2
layer_dense(units = 128,
activation = 'relu',
name = "hidden_2") %>%
# hidden layer 3
layer_dense(units = 64,
activation = 'relu',
name = "hidden_3") %>%
# output layer
layer_dense(units = num_class,
activation = 'softmax',
name='output')
summary(model_bigger)## Model: "Model-Bigger"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## hidden_1 (Dense) (None, 256) 200960
## hidden_2 (Dense) (None, 128) 32896
## hidden_3 (Dense) (None, 64) 8256
## output (Dense) (None, 24) 1560
## ================================================================================
## Total params: 243,672
## Trainable params: 243,672
## Non-trainable params: 0
## ________________________________________________________________________________
2.3 Building Architecture (define cost function and optimizer)
We still need to do several settings before training the
model_base and model_bigger. We must compile
the model by defining the loss function, optimizer
type, and evaluation metrics. Please compile the model to
model_base and model_bigger by setting these
parameters:
categorical_crossentropy as the loss function
optimizer_adam as the optimizer with learning rate of 0.001
use accuracy as the evaluation metric
# wer code here
model_base %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.001),
metric = 'accuracy'
)# wer code here
model_bigger %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.001),
metric = 'accuracy'
)2.4 Fitting model in the training set (define epoch and batch size)
In this step, we fit our model using epoch = 10,
batch_size = 150, and set parameter
shuffle = F so that the samples in each batch are not taken
randomly but sorted (sequence) for
model_base and model_bigger. We can save the
fit model results into history_base and
history_bigger.
# wer code here
history_base <- model_base %>% fit(x = train_x_array,
y = train_y_dummy,
epochs = 10,
batch_size = 150,
shuffle = F,
verbose = 1)
plot(history_base)# wer code here
history_bigger <- model_bigger %>% fit(x = train_x_array,
y = train_y_dummy,
epochs = 10,
batch_size = 150,
shuffle = F,
verbose = 1)
plot(history_base)Note : In the fitting model above, the meaning of
epoch=10, The model does the feed-forward -
back-propagation for all batch 10 times
To evaluate the model performance in unseen data, we will predict the
testing (test_x_.keras_array) data using the trained model.
Please predict using predict() function and store it as
pred_base and pred_bigger.
4.1 Confusion Matrix (classification)
We can evaluate the model using several metrics. Check the
accuracy by creating confusion matrix. We can use
confusionMatrix() from caret package. Also do
the explicit coercion as.factor if were data is
not yet stored as factor.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 0 323 0 0 0 0 0 0 0 3 0 0 2 61 0 0 21 0
## 1 0 307 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0
## 2 0 0 251 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 149 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 1 0 0 0 412 0 0 21 0 0 0 63 28 32 0 0 0
## 5 0 0 21 0 0 194 5 0 0 0 4 0 0 53 0 6 0
## 6 0 0 17 0 0 0 180 39 0 0 0 0 0 20 0 0 0
## 7 0 0 0 0 0 0 64 338 0 0 0 6 0 17 0 1 0
## 8 0 0 0 0 0 0 23 6 203 21 0 0 1 0 2 0 1
## 9 0 92 0 0 0 0 0 0 0 200 0 0 0 0 4 0 1
## 10 0 0 0 0 0 0 0 0 0 0 205 0 0 0 0 0 20
## 11 0 0 0 0 0 0 0 0 0 0 0 111 0 0 0 0 0
## 12 0 0 0 0 0 0 18 0 5 0 0 42 79 14 0 13 0
## 13 1 0 0 0 0 0 0 2 0 0 0 21 18 99 0 0 0
## 14 0 0 0 0 0 0 19 0 0 0 0 0 0 0 317 0 0
## 15 0 0 0 0 0 0 5 0 0 0 0 19 19 5 3 123 0
## 16 0 9 0 37 0 0 0 0 2 83 0 0 0 0 0 0 81
## 17 6 21 0 7 86 0 0 0 33 0 0 130 56 0 0 0 20
## 18 0 0 21 0 0 14 19 13 0 0 0 0 24 1 0 0 0
## 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 20 0 3 0 18 0 18 0 17 0 0 0 0 0 3 0 0 21
## 21 0 0 0 0 0 21 0 0 12 0 0 0 0 0 16 0 0
## 22 0 0 0 34 0 0 15 0 1 0 0 0 5 2 5 0 0
## 23 0 0 0 0 0 0 0 0 21 27 0 0 0 0 0 0 0
## Reference
## Prediction 17 18 19 20 21 22 23
## 0 0 0 0 0 0 0 0
## 1 0 0 0 0 6 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 14 0 0 0 0
## 4 21 0 0 0 0 0 0
## 5 0 0 0 20 0 0 0
## 6 0 0 0 0 0 0 0
## 7 0 21 0 0 0 0 1
## 8 114 22 0 0 0 22 82
## 9 0 0 88 38 28 0 0
## 10 0 27 0 0 0 21 41
## 11 43 0 0 0 0 0 0
## 12 6 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0
## 14 0 9 0 0 0 18 0
## 15 0 0 0 0 0 0 0
## 16 0 3 74 72 4 12 0
## 17 62 0 0 0 0 0 40
## 18 0 81 0 0 0 0 0
## 19 0 13 21 0 0 0 0
## 20 0 8 67 195 89 7 21
## 21 0 13 0 1 79 44 0
## 22 0 51 0 20 0 143 0
## 23 0 0 2 0 0 0 147
##
## Overall Statistics
##
## Accuracy : 0.5996
## 95% CI : (0.5881, 0.6109)
## No Information Rate : 0.0694
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.5812
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.97583 0.71065 0.80968 0.60816 0.82731 0.78543
## Specificity 0.98728 0.99792 1.00000 0.99798 0.97513 0.98426
## Pos Pred Value 0.78780 0.95639 1.00000 0.91411 0.71280 0.64026
## Neg Pred Value 0.99882 0.98175 0.99148 0.98630 0.98696 0.99228
## Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
## Detection Rate 0.04504 0.04281 0.03500 0.02078 0.05745 0.02705
## Detection Prevalence 0.05717 0.04476 0.03500 0.02273 0.08059 0.04225
## Balanced Accuracy 0.98156 0.85429 0.90484 0.80307 0.90122 0.88484
## Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity 0.51724 0.77523 0.70486 0.60423 0.98086 0.28173
## Specificity 0.98886 0.98367 0.95729 0.96331 0.98435 0.99366
## Pos Pred Value 0.70312 0.75446 0.40845 0.44346 0.65287 0.72078
## Neg Pred Value 0.97571 0.98543 0.98727 0.98051 0.99942 0.95968
## Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
## Detection Rate 0.02510 0.04713 0.02830 0.02789 0.02858 0.01548
## Detection Prevalence 0.03569 0.06247 0.06930 0.06288 0.04378 0.02147
## Balanced Accuracy 0.75305 0.87945 0.83108 0.78377 0.98260 0.63769
## Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity 0.27148 0.40244 0.91354 0.75000 0.56250
## Specificity 0.98576 0.99394 0.99326 0.99272 0.95788
## Pos Pred Value 0.44633 0.70213 0.87328 0.70690 0.21485
## Neg Pred Value 0.96969 0.97909 0.99559 0.99414 0.99073
## Prevalence 0.04057 0.03430 0.04838 0.02287 0.02008
## Detection Rate 0.01102 0.01380 0.04420 0.01715 0.01129
## Detection Prevalence 0.02468 0.01966 0.05061 0.02426 0.05257
## Balanced Accuracy 0.62862 0.69819 0.95340 0.87136 0.76019
## Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity 0.252033 0.32661 0.078947 0.56358 0.38350
## Specificity 0.942391 0.98671 0.998118 0.96015 0.98464
## Pos Pred Value 0.134490 0.46821 0.617647 0.41756 0.42473
## Neg Pred Value 0.972582 0.97614 0.965677 0.97748 0.98182
## Prevalence 0.034300 0.03458 0.037089 0.04824 0.02872
## Detection Rate 0.008645 0.01129 0.002928 0.02719 0.01102
## Detection Prevalence 0.064278 0.02412 0.004741 0.06511 0.02593
## Balanced Accuracy 0.597212 0.65666 0.538532 0.76187 0.68407
## Class: 22 Class: 23
## Sensitivity 0.53558 0.44277
## Specificity 0.98074 0.99269
## Pos Pred Value 0.51812 0.74619
## Neg Pred Value 0.98202 0.97348
## Prevalence 0.03723 0.04629
## Detection Rate 0.01994 0.02050
## Detection Prevalence 0.03848 0.02747
## Balanced Accuracy 0.75816 0.71773
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 0 331 0 0 0 0 0 0 0 40 0 0 20 42 0 0 0 0
## 1 0 400 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 268 0 0 16 0 0 0 0 0 0 0 3 21 0 0
## 3 0 4 0 204 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 423 0 0 0 0 0 0 21 0 0 0 0 0
## 5 0 0 21 0 0 231 0 0 0 39 0 0 0 21 0 0 0
## 6 0 0 0 0 0 0 194 3 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 56 395 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 180 0 0 0 0 0 0 0 20
## 9 0 21 0 0 0 0 0 0 0 155 0 0 0 0 0 0 10
## 10 0 0 0 0 0 0 0 0 0 0 148 0 0 0 0 0 21
## 11 0 0 0 0 0 0 0 21 0 0 0 183 2 0 0 21 0
## 12 0 0 0 0 0 0 16 0 0 0 0 76 125 0 0 2 19
## 13 0 0 0 0 0 0 20 0 0 0 0 2 39 165 0 0 0
## 14 0 0 0 0 0 0 0 0 0 4 0 0 0 0 289 0 0
## 15 0 0 0 0 0 0 26 0 8 0 0 24 21 36 2 141 0
## 16 0 0 0 0 0 0 0 0 0 39 0 0 0 0 0 0 57
## 17 0 3 0 0 75 0 0 0 14 15 0 68 61 0 35 0 16
## 18 0 0 15 0 0 0 36 17 0 0 1 0 1 21 0 0 0
## 19 0 2 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 21 0 2 0 0 0 0 0 0 2 18 0 0 0 0 0 0 0
## 22 0 0 6 41 0 0 0 0 19 0 60 0 0 0 0 0 0
## 23 0 0 0 0 0 0 0 0 25 41 0 0 0 0 0 0 1
## Reference
## Prediction 17 18 19 20 21 22 23
## 0 0 0 0 0 0 0 0
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 83 0 0 0 4
## 4 21 0 0 0 0 0 0
## 5 0 0 0 20 4 0 0
## 6 0 0 0 29 0 0 0
## 7 20 21 0 0 0 0 0
## 8 27 21 4 1 20 0 22
## 9 0 0 51 61 5 0 22
## 10 0 0 0 0 0 0 21
## 11 25 0 0 0 0 0 0
## 12 12 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0
## 14 0 0 0 18 0 21 0
## 15 1 0 0 1 0 0 0
## 16 0 0 38 1 3 0 4
## 17 140 0 0 5 0 0 20
## 18 0 141 0 27 0 2 21
## 19 0 0 40 0 28 0 0
## 20 0 0 23 117 0 0 0
## 21 0 0 0 40 146 14 0
## 22 0 65 0 5 0 230 0
## 23 0 0 27 21 0 0 218
##
## Overall Statistics
##
## Accuracy : 0.6861
## 95% CI : (0.6753, 0.6969)
## No Information Rate : 0.0694
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6718
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 1.00000 0.92593 0.86452 0.83265 0.84940 0.93522
## Specificity 0.98509 1.00000 0.99417 0.98686 0.99371 0.98484
## Pos Pred Value 0.76443 1.00000 0.87013 0.69153 0.90968 0.68750
## Neg Pred Value 1.00000 0.99527 0.99388 0.99404 0.98882 0.99766
## Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
## Detection Rate 0.04615 0.05577 0.03737 0.02844 0.05898 0.03221
## Detection Prevalence 0.06037 0.05577 0.04294 0.04113 0.06484 0.04685
## Balanced Accuracy 0.99254 0.96296 0.92934 0.90976 0.92155 0.96003
## Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity 0.55747 0.90596 0.62500 0.46828 0.70813 0.46447
## Specificity 0.99531 0.98560 0.98329 0.97515 0.99397 0.98982
## Pos Pred Value 0.85841 0.80285 0.61017 0.47692 0.77895 0.72619
## Neg Pred Value 0.97783 0.99386 0.98430 0.97430 0.99126 0.96951
## Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
## Detection Rate 0.02705 0.05508 0.02510 0.02161 0.02064 0.02552
## Detection Prevalence 0.03151 0.06860 0.04113 0.04532 0.02649 0.03514
## Balanced Accuracy 0.77639 0.94578 0.80415 0.72171 0.85105 0.72714
## Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity 0.42955 0.67073 0.83285 0.85976 0.395833
## Specificity 0.98183 0.99119 0.99370 0.98302 0.987906
## Pos Pred Value 0.50000 0.73009 0.87048 0.54231 0.401408
## Neg Pred Value 0.97602 0.98834 0.99152 0.99667 0.987624
## Prevalence 0.04057 0.03430 0.04838 0.02287 0.020078
## Detection Rate 0.01743 0.02301 0.04030 0.01966 0.007948
## Detection Prevalence 0.03486 0.03151 0.04629 0.03625 0.019799
## Balanced Accuracy 0.70569 0.83096 0.91328 0.92139 0.691869
## Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity 0.56911 0.56855 0.150376 0.33815 0.70874
## Specificity 0.95495 0.97964 0.992760 0.99663 0.98909
## Pos Pred Value 0.30973 0.50000 0.444444 0.83571 0.65766
## Neg Pred Value 0.98423 0.98447 0.968088 0.96743 0.99137
## Prevalence 0.03430 0.03458 0.037089 0.04824 0.02872
## Detection Rate 0.01952 0.01966 0.005577 0.01631 0.02036
## Detection Prevalence 0.06302 0.03932 0.012549 0.01952 0.03095
## Balanced Accuracy 0.76203 0.77409 0.571568 0.66739 0.84891
## Class: 22 Class: 23
## Sensitivity 0.86142 0.65663
## Specificity 0.97161 0.98319
## Pos Pred Value 0.53991 0.65465
## Neg Pred Value 0.99452 0.98333
## Prevalence 0.03723 0.04629
## Detection Rate 0.03207 0.03040
## Detection Prevalence 0.05940 0.04643
## Balanced Accuracy 0.91652 0.81991
From the two confusion matrix above, we can conclude, The more hidden layer and neuron, the model may have better performance because more features will be extracted from the data
4.2 Model Tuning
Because both models have not provided a good enough performance
(best fit) where model_base tends to be
underfitting and model_bigger tends to be
overfitting, improvements will be made to
model_bigger. Now, let’s try to build
model_tuning by defining the following parameters:
the first layer contains 128 nodes, relu activation function, 784 input shape
the second layer contains 64 nodes, relu activation function
the third layer contains 24 nodes, softmax activation function
# wer code here
tensorflow::set_random_seed(8)
model_tuning <- keras_model_sequential(name = "Model-Tuning") %>%
# input layer + hidden layer 1
layer_dense(input_shape = input_dim, # ukuran input (jumlah prediktor)
units = 128, # jumlah node di layer ini
activation = 'relu', # activation function
name = 'hidden_1') %>%
# hidden layer 2
layer_dense(units = 64,
activation = 'relu',
name = "hidden_2") %>%
# output layer
layer_dense(units = num_class,
activation = 'softmax',
name='output')
summary(model_tuning)## Model: "Model-Tuning"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## hidden_1 (Dense) (None, 128) 100480
## hidden_2 (Dense) (None, 64) 8256
## output (Dense) (None, 24) 1560
## ================================================================================
## Total params: 110,296
## Trainable params: 110,296
## Non-trainable params: 0
## ________________________________________________________________________________
Then, compile the by setting these parameters:
categorical_crossentropy as the loss function
optimizer_adam as the optimizer with learning rate of 0.001
use accuracy as the evaluation metric
# wer code here
model_tuning %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.001),
metric = 'accuracy'
)Last, fit model using epoch = 10,
batch_size = 150, and set parameter
shuffle = F so that the samples in each batch are not taken
randomly but sorted (sequence).
# wer code here
history_tuning <- model_tuning %>% fit(x = train_x_array,
y = train_y_dummy,
epochs = 10,
batch_size = 150,
shuffle = F,
verbose = 1)
plot(history_tuning)After tuning the model, please do the predict
test_x_array using model_tuning. Please
predict using predict() function and store it as
pred_tuning
# wer code here
pred_tuning <- predict(object = model_tuning, x = test_x_array) %>%
k_argmax() %>%
as.array() %>%
as.factor()Check the model performance using accuracy. We can use
confusionMatrix() function from caret package.
Also do the explicit coercion as.factor if wer
data is not yet stored as factor.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 0 312 0 0 0 0 0 0 0 0 0 0 2 48 0 0 10 0
## 1 0 370 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 267 0 0 1 0 0 0 0 0 0 0 14 0 0 0
## 3 0 4 0 173 0 0 7 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 414 0 0 0 0 0 0 5 19 10 0 0 0
## 5 0 0 21 0 0 214 0 0 0 25 0 0 0 18 0 9 0
## 6 0 0 0 0 0 0 241 31 0 0 0 0 42 12 0 0 0
## 7 0 0 0 0 0 0 36 404 0 0 0 0 0 19 0 0 0
## 8 0 0 0 0 0 0 2 0 171 0 0 0 0 0 0 0 0
## 9 0 37 0 0 0 2 0 0 0 190 0 0 0 0 0 0 6
## 10 0 0 1 0 0 0 0 0 7 0 206 0 0 0 0 0 21
## 11 0 0 0 0 19 0 0 0 0 0 0 232 21 0 0 11 0
## 12 19 0 0 0 0 0 0 0 0 0 0 24 95 0 0 0 0
## 13 0 0 0 0 0 0 0 0 0 0 0 21 0 144 0 0 0
## 14 0 0 0 0 0 0 24 0 0 0 0 0 0 0 338 0 0
## 15 0 0 0 0 0 0 19 0 0 0 0 16 21 21 9 134 0
## 16 0 0 0 30 0 0 11 0 50 58 0 0 0 0 0 0 97
## 17 0 0 0 0 65 0 0 0 14 0 0 94 45 0 0 0 0
## 18 0 0 1 0 0 19 8 1 0 0 0 0 0 8 0 0 0
## 19 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20
## 20 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0
## 21 0 1 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0
## 22 0 0 20 27 0 0 0 0 4 0 3 0 0 0 0 0 0
## 23 0 0 0 0 0 0 0 0 42 58 0 0 0 0 0 0 0
## Reference
## Prediction 17 18 19 20 21 22 23
## 0 2 0 0 0 0 0 0
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 6 0 0 0 0
## 4 41 0 0 0 0 0 0
## 5 0 0 0 20 0 0 0
## 6 0 0 0 0 0 0 0
## 7 20 20 0 0 0 0 0
## 8 42 20 0 0 0 0 21
## 9 0 0 96 66 16 0 20
## 10 0 20 2 0 0 0 21
## 11 11 0 0 0 0 0 0
## 12 26 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0
## 14 0 6 0 0 0 1 0
## 15 1 0 0 0 0 0 0
## 16 0 21 43 50 29 16 59
## 17 103 0 0 0 0 19 0
## 18 0 131 0 0 0 2 21
## 19 0 3 89 0 19 0 0
## 20 0 0 30 159 0 0 0
## 21 0 0 0 40 142 43 0
## 22 0 26 0 11 0 186 0
## 23 0 1 0 0 0 0 190
##
## Overall Statistics
##
## Accuracy : 0.6974
## 95% CI : (0.6867, 0.7081)
## No Information Rate : 0.0694
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6837
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.94260 0.85648 0.86129 0.70612 0.83133 0.86640
## Specificity 0.99094 1.00000 0.99781 0.99755 0.98876 0.98657
## Pos Pred Value 0.83422 1.00000 0.94681 0.91053 0.84663 0.69707
## Neg Pred Value 0.99721 0.99089 0.99376 0.98969 0.98743 0.99519
## Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
## Detection Rate 0.04350 0.05159 0.03723 0.02412 0.05772 0.02984
## Detection Prevalence 0.05215 0.05159 0.03932 0.02649 0.06818 0.04281
## Balanced Accuracy 0.96677 0.92824 0.92955 0.85183 0.91004 0.92648
## Class: 6 Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity 0.69253 0.92661 0.59375 0.57402 0.98565 0.58883
## Specificity 0.98754 0.98590 0.98765 0.96448 0.98966 0.99085
## Pos Pred Value 0.73926 0.80962 0.66797 0.43880 0.74101 0.78912
## Neg Pred Value 0.98437 0.99520 0.98308 0.97908 0.99956 0.97645
## Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
## Detection Rate 0.03360 0.05633 0.02384 0.02649 0.02872 0.03235
## Detection Prevalence 0.04545 0.06958 0.03569 0.06037 0.03876 0.04099
## Balanced Accuracy 0.84004 0.95625 0.79070 0.76925 0.98765 0.78984
## Class: 12 Class: 13 Class: 14 Class: 15 Class: 16
## Sensitivity 0.32646 0.58537 0.97406 0.81707 0.67361
## Specificity 0.98997 0.99697 0.99546 0.98759 0.94778
## Pos Pred Value 0.57927 0.87273 0.91599 0.60633 0.20905
## Neg Pred Value 0.97203 0.98544 0.99868 0.99568 0.99299
## Prevalence 0.04057 0.03430 0.04838 0.02287 0.02008
## Detection Rate 0.01325 0.02008 0.04713 0.01868 0.01352
## Detection Prevalence 0.02287 0.02301 0.05145 0.03081 0.06470
## Balanced Accuracy 0.65822 0.79117 0.98476 0.90233 0.81070
## Class: 17 Class: 18 Class: 19 Class: 20 Class: 21
## Sensitivity 0.41870 0.52823 0.33459 0.45954 0.68932
## Specificity 0.96578 0.99133 0.99102 0.99341 0.98636
## Pos Pred Value 0.30294 0.68586 0.58940 0.77941 0.59916
## Neg Pred Value 0.97907 0.98324 0.97479 0.97316 0.99077
## Prevalence 0.03430 0.03458 0.03709 0.04824 0.02872
## Detection Rate 0.01436 0.01827 0.01241 0.02217 0.01980
## Detection Prevalence 0.04741 0.02663 0.02105 0.02844 0.03305
## Balanced Accuracy 0.69224 0.75978 0.66280 0.72647 0.83784
## Class: 22 Class: 23
## Sensitivity 0.69663 0.57229
## Specificity 0.98682 0.98523
## Pos Pred Value 0.67148 0.65292
## Neg Pred Value 0.98825 0.97936
## Prevalence 0.03723 0.04629
## Detection Rate 0.02593 0.02649
## Detection Prevalence 0.03862 0.04057
## Balanced Accuracy 0.84173 0.77876
The hidden layer is the place where information from the data is
extracted. What can we conclude from model_bigger and
model_tuning about hidden layers, The more number of hidden
layers used (deep), the neural network model tends to be
overfitting
Note: Consider the following criteria for this case
The model is considered quite good if the accuracy reaches >= 70%
The model is considered poor if the accuracy is below 70%
Model performance is considered balanced in both train and test data if the difference in accuracy is <= 20%
From the three models above (model_base,
model_bigger and model_tuning), The best model
for us to pick is model_tuning, because the accuracy is quite
high and the difference in accuracy between train data and test data is
the smallest
Note: for this case, we consider a model that has a high enough accuracy is the model that obtains an accuracy above 65% both on the train data and on the test data.
We have completed the task to build a deep learning model to classify sign language images. This model will be very helpful for person with hearing impairment/loss that communicates with sign language, so that they can communicate with common people. This project can be developed further into a sign language-based communication app.