# clear-up the environment
rm(list = ls())
# chunk options
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
fig.align = "center",
comment = "#>"
)
This article is made for completing the assignment for Algoritma : Machine Learning Course in Neural Network and Deep Learning Theia Batch 2022.
In this article I will try to make image classification model using MNIST : American Sign Language Dataset.
The original MNIST image dataset of handwritten digits is a popular benchmark for image-based machine learning methods but researchers have renewed efforts to update it and develop drop-in replacements that are more challenging for computer vision and original for real-world applications. As noted in one recent replacement called the Fashion-MNIST dataset, the Zalando researchers quoted the startling claim that “Most pairs of MNIST digits (784 total pixels per sample) can be distinguished pretty well by just one pixel”. To stimulate the community to develop more drop-in replacements, the Sign Language MNIST is presented here and follows the same CSV format with labels and pixel values in single rows.
The American Sign Language letter database of hand gestures represent a multi-class problem with 24 classes of letters (excluding J and Z which require motion).
This is the Complete American Sign LanguageAmerican Sign Language Complete
American Sign Language in This Dataset
So instead of 26 Class of Alphabet, we will use 24 Class of Alphabet excluding J and Z.
#Load Library
library(tufte)
library(ggplot2)
library(tidyverse)
library(dplyr)
library(caret)
library(keras)
library(tensorflow)
The dataset is provided in a form of .csv Document. It contains one column of labels of which alphabet is represented on the image, and then the rest of columns are the pixel value of each image on the dataset.
There are two .csv documents provided, one is for training purposes and one is for validation purposes. Each document also contains label for re-validation purpose.
#Import Dataset
train_hand <- read.csv("data-input/sign_mnist_train.csv")
test_hand <- read.csv("data-input/sign_mnist_test.csv")
#Check Dimension of Dataset
dim(train_hand)
#> [1] 27455 785
dim(test_hand)
#> [1] 7172 785
This dataset contains 27455 data of image for training, and 7172 data of image for validation (testing).
Check the structure of Data using head() function.
head(train_hand)
range(train_hand$pixel1:train_hand$pixel784)
#> [1] 107 202
Based on the glimpse above, we see that the data frame contains 785 columns, of which one column is label and 784 columns contain pixel value. Pixel value is grayscale because we only have one channel (one-dataframe for one-color) in this example.
We wanted to check how many class on the dataset :
sort(unique(train_hand$label))
#> [1] 0 1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
length(sort(unique(train_hand$label)))
#> [1] 24
We see that there are values from 0 to 24, but we only have 24
classes. After looking at the value we see that value 9 is
missing.
We will take a further look to this value :
#Create Function for Pixels Visualization
vizTrain <- function(input){
dimmax <- sqrt(ncol(input[,-1]))
dimn <- ceiling(sqrt(nrow(input)))
par(mfrow=c(dimn, dimn), mar=c(.1, .1, .1, .1))
for (i in 1:nrow(input)){
m1 <- as.matrix(input[i,2:785])
dim(m1) <- c(28,28)
m1 <- apply(apply(m1, 1, rev), 1, t)
image(1:28, 1:28,
m1, col=grey.colors(255),
# remove axis text
xaxt = 'n', yaxt = 'n')
text(2, 20, col="white", cex=1.2, input[i, 1])
}
}
First, we take a look at label 0 :
# Check Label 0
vizTrain(head(train_hand[train_hand$label == 0,], 9))
We see that the class 0 is representing A
in the alphabet, based on the ASL picture above in this article.
Next, we take a look at label 8 and label
10 :
# Check Label 8
vizTrain(head(train_hand[train_hand$label == 8,], 9))
# Check Label 10
vizTrain(head(train_hand[train_hand$label == 10,], 9))
We see that the class is actually sorted from 0 - 24 is represented for
A - Y. Because of that we change the label accordingly to eliminate the
missing value of
9 on the label both for Training and
Testing Dataset.
(if anyone has an opinion for better codes, I’m open for correction! Please give me some insight)
#Replace 0 with 1 ~ 8 with 9, Training Dataset.
train_hand <- train_hand %>% mutate(label = ifelse(label == 0, 1,
ifelse(label == 1, 2,
ifelse(label == 2, 3,
ifelse(label == 3, 4,
ifelse(label == 4, 5,
ifelse(label == 5, 6,
ifelse(label == 6, 7,
ifelse(label == 7, 8,
ifelse(label == 8, 9, label)
)
)
)
)
)
)
)
)
) %>%
mutate(label = as.numeric(label) - 1)
#Replace 0 with 1 ~ 8 with 9, Testing Dataset
test_hand <- test_hand %>% mutate(label = ifelse(label == 0, 1,
ifelse(label == 1, 2,
ifelse(label == 2, 3,
ifelse(label == 3, 4,
ifelse(label == 4, 5,
ifelse(label == 5, 6,
ifelse(label == 6, 7,
ifelse(label == 7, 8,
ifelse(label == 8, 9, label)
)
)
)
)
)
)
)
)
) %>%
mutate(label = as.numeric(label) - 1)
We check the proportion of each class on the training dataset :
# Check Proportion of Each Class
prop.table(table(train_hand$label))
#>
#> 0 1 2 3 4 5 6
#> 0.04101257 0.03678747 0.04166818 0.04356219 0.03485704 0.04385358 0.03970133
#> 7 8 9 10 11 12 13
#> 0.03689674 0.04232380 0.04057549 0.04520124 0.03842652 0.04192315 0.04356219
#> 14 15 16 17 18 19 20
#> 0.03962848 0.04658532 0.04713167 0.04367146 0.04319796 0.04228738 0.03940994
#> 21 22 23
#> 0.04461847 0.04239665 0.04072118
Based on values above, we think that the proportion is reasonably equal. So we decided to further use this dataset without changing the proportion, although we will still use data augmentation in CNN Model.
A neural network is a type of machine learning model that is inspired by the structure and function of the brain. It is made up of layers of interconnected nodes, which process and transmit information. Each node receives input from other nodes, performs a computation on that input, and produces an output that is transmitted to other nodes. The weights of the connections between nodes can be adjusted based on input and output data, allowing the neural network to “learn” from the data. Neural networks are commonly used for tasks such as image and speech recognition, language translation, and predictive modeling.
Model Neural Network
Input Layer : is the first layer of a neural network and is responsible for accepting the input data and forwarding it to the other layers of the network for processing.
Hidden Layer : are responsible for performing computations on the input data and transmitting the results to the output layer. They are usually composed of a large number of interconnected nodes, which perform a non-linear computation on the input data and produce an output that is transmitted to the next layer.
Output Layer : is the last layer of a neural network and is responsible for producing the final output of the network. The output of the network may be a prediction, a classification, or some other type of computation, depending on the task that the network is designed to perform.
We separate the data frame from label and pixel values. Because the pixel values range from 1 - 255, we divide it by 255 so we have a value from 0 ~ 1. This will reduce computation time so it will be lighter for our modeling process.
#Separate Training Pixel Value
train_x <- train_hand %>%
select(-label) %>%
as.matrix()/255 #divide by 255 so the value 0 ~ 1
#Separate Train Label
train_y <- train_hand$label
#Separate Test Pixel Value
test_x <- test_hand %>%
select(-label) %>%
as.matrix()/255 #divide by 255 so the value 0 ~ 1
#Separate Test Label
test_y <- test_hand$label
We process the label for one-hot encoding. It is a process used to convert categorical variables, which have a finite set of possible values, into a numerical representation that can be used as input to machine learning algorithms.
In one-hot encoding, each categorical value is represented as a binary vector, with a 1 in the position corresponding to the categorical value, and 0s in all other positions. For example A, B, and C, the one-hot encoding would be:
A: [1, 0, 0, etc] B: [0, 1, 0, etc] C: [0, 0, 1, etc]
One-hot encoding is often used when working with categorical data in machine learning, because many algorithms cannot handle categorical variables directly. By encoding the categorical variables into a numerical representation, we can use these algorithms with our data.
#one hot encoding
train_y <- to_categorical(train_y , num_classes = 24)
test_y <- to_categorical(test_y , num_classes = 24)
We take input_dim from the number of columns in
train_x / basically how many pixels are there. We take
num_class from counting how many values in
label in the dataset.
input_dim <- ncol(train_x)
num_class <- n_distinct(train_hand$label)
We create initial model with name model_0. We will
further analyze the result to make better models
#set random weight
set_random_seed(100)
# Create Model Architecture
model_0 <- keras_model_sequential(name = "model_0") %>%
# input layer + hidden layer 1
layer_dense(units = 64, #nodes number on hidden layer
activation = "relu", #activation function on hidden layer, because we process data we use "relu"
name = "hidden_1",
input_shape = input_dim #nodes number on input layer is equal to input dimensions
) %>%
# hidden layer 2
layer_dense(units = 16,
activation = "relu",
name = "hidden_2"
) %>%
# output layer
layer_dense(units = num_class,
activation = "softmax", #activation function on output layer, we use "softmax" because the output is multiple classes
name = "output")
model_0
#> Model: "model_0"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> hidden_1 (Dense) (None, 64) 50240
#> ________________________________________________________________________________
#> hidden_2 (Dense) (None, 16) 1040
#> ________________________________________________________________________________
#> output (Dense) (None, 24) 408
#> ================================================================================
#> Total params: 51,688
#> Trainable params: 51,688
#> Non-trainable params: 0
#> ________________________________________________________________________________
We set Loss Function and Optimizer for updating weight for every epoch the machine will learn.
model_0 %>%
compile(loss = "categorical_crossentropy", #because multiple class, use "categorical_crossentropy"
optimizer = optimizer_adam(learning_rate = 0.01), #adam optimizer works better in this case (personal trial)
metrics = "accuracy")
Model Fitting to further see the performance of model from the first epoch into the last.
In this section, epoch is basically a number of iteration on the dataset. So it will split the learning into 20 iterations. Batch Size is the number of images processed in one epoch.
We use epoch = 20 because we didn’t see any further
change from 20 more, batch size = 128 is acquired based on
personal tweaking on this dataset, bigger batch size is not giving us
any additional information on the model.
history0 <- model_0 %>% fit(x = train_x, #Predictor using dataset from `train_x`
y = train_y, #label
epoch = 20, #iteration number
batch_size = 128, #number of data (images in this case) on each epoch
validation_data = list( test_x , test_y ), #cross validation data
verbose = 1) #for tracing the result
hist0 <- plot(history0)
ggsave("data-output/plot/hist0.png", hist0)
history0
Insight :
We change the number of nodes on the first and second hidden layer. We also put smaller learning rate (0.001) so it can grasp the learning curve better.
set_random_seed(100)
model_1 <- keras_model_sequential(name = "model_1") %>%
# input layer + hidden layer 1
layer_dense(units = 128,
activation = "relu",
name = "hidden_1",
input_shape = input_dim
) %>%
# hidden layer 2
layer_dense(units = 64,
activation = "relu",
name = "hidden_2"
) %>%
# output layer
layer_dense(units = num_class,
activation = "softmax",
name = "output")
model_1
#> Model: "model_1"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> hidden_1 (Dense) (None, 128) 100480
#> ________________________________________________________________________________
#> hidden_2 (Dense) (None, 64) 8256
#> ________________________________________________________________________________
#> output (Dense) (None, 24) 1560
#> ================================================================================
#> Total params: 110,296
#> Trainable params: 110,296
#> Non-trainable params: 0
#> ________________________________________________________________________________
model_1 %>%
compile(loss = "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.001),
metrics = "accuracy")
# train model
history1 <- model_1 %>% fit(x = train_x,
y = train_y,
epoch = 20,
batch_size = 128,
validation_data = list( test_x , test_y ),
verbose = 1)
hist1 <- plot(history1)
ggsave("data-output/plot/hist1.png", hist1)
history1
Insight :
Epoch : 12We add another hidden layer and change the nodes on each hidden layer. We also put smaller learning rate to this iteration :
# Set seed bobot awal
set_random_seed(100)
# Membuat arsitektur
model_2 <- keras_model_sequential(name = "model_2") %>%
# input layer + hidden layer 1
layer_dense(units = 512,
activation = "relu",
name = "hidden_1",
input_shape = input_dim
) %>%
# hidden layer 2
layer_dense(units = 128,
activation = "relu",
name = "hidden_2"
) %>%
# hidden layer 3
layer_dense(units = 32,
activation = "relu",
name = "hidden_3"
) %>%
# output layer
layer_dense(units = num_class,
activation = "softmax",
name = "output")
model_2
#> Model: "model_2"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> hidden_1 (Dense) (None, 512) 401920
#> ________________________________________________________________________________
#> hidden_2 (Dense) (None, 128) 65664
#> ________________________________________________________________________________
#> hidden_3 (Dense) (None, 32) 4128
#> ________________________________________________________________________________
#> output (Dense) (None, 24) 792
#> ================================================================================
#> Total params: 472,504
#> Trainable params: 472,504
#> Non-trainable params: 0
#> ________________________________________________________________________________
model_2 %>%
compile(loss = "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.0005),
metrics = "accuracy")
history2 <- model_2 %>% fit(x = train_x,
y = train_y,
epoch = 20,
batch_size = 128,
validation_data = list( test_x , test_y ),
verbose = 1)
hist2 <- plot(history2)
ggsave("data-output/plot/hist2.png", hist2)
history2
Insight :
A convolutional neural network (CNN) is a type of neural network specifically designed for image and video analysis. It is particularly effective at identifying patterns and features in images, making it a powerful tool for image classification, object detection, and other image processing tasks.
CNNs are characterized by their use of convolutional layers, which apply a set of filters to the input data to extract features and create a representation of the input that is easier to process. These filters are called kernels or weights, and they are adjusted during the training process to optimize the network’s performance.
The convolutional layers are followed by one or more fully connected layers, which perform a traditional neural network computation on the output of the convolutional layers. This combination of convolutional and fully connected layers allows CNNs to learn and recognize patterns and features in images and other data, while also being able to classify or predict based on those patterns.
CNNs are widely used in a variety of applications, including image and video classification, object detection, image generation, and many others.
Convolutional Neural Network
Basically, compared to the common Neural Network. Convolutional Neural Network (CNN) is processing the image on a matrix array, so it will scan the image using a square filter on a square image.
It will respect the configuration of each pixel on the-x and the-y axis so there’s information gained between those pixels. After several filter applied the model will acquire several features on the image and the dimension will be much smaller and filter can be bigger, then it will be flatten so it will become a 1D array and then it will be processed using a common neural network.
CNN Matrix
We use several Layer in this Modelling
Convolutional Layer : the use of convolutional filters, which apply a set of weights to small regions of the input data to extract features and create a representation of the input that is easier to process. The Output is a feature map, that catch certain features on the image.
Normalization Layer : used to improve the training and generalization of a neural network by reducing the internal covariate shift, which is the change in the distribution of the inputs to a layer caused by the adjustment of the weights of the previous layer
Max Pooling Layer : a downsampling technique that reduces the dimensionality of the input data by taking the maximum value of a group of adjacent elements.
Drop Out Layer : a regularization technique for reducing overfitting in neural networks. It is implemented as a layer in the network that “drops out” a random subset of the activations of the previous layer during training, by setting them to zero.
Flattening Layer : to flatten the output of a convolutional or pooling layer into a single vector
After flattening, the rest is the same with traditional form of Neural Network.
Separate the column pixels and the label, just like the 1st Method.
train_x_2 <- train_hand %>%
select(-label)
train_y_2 <- train_hand$label
test_x_2 <- test_hand %>%
select(-label)
test_y_2 <- test_hand$label
dim(train_x_2)
#> [1] 27455 784
dim(test_x_2)
#> [1] 7172 784
We have one dimension of values (pixel 1 ~ pixel 784), we need to
change it accordingly into height and width.
How to do it? transform the matrix array
sqrt(784)
#> [1] 28
Based on the number of pixels (784), the height and width is acquired by **square rooting* the values, the result is 28.
Because the initial form is row as number of samples,
and then pixels. We need to transform it first into
pixels and then row. I got help
from someone :
“When R handles indexing for arrays or matrices, it assumes that the ordering as primarily on the columns. You, however, appear to want to create a 25 x 25 matrix-slice based on successive rows of that larger matrix, so the first thing to do is transpose so the row values are in columns columns:”
# transpose the matrix array
train_x_img <- t(train_x_2)
test_x_img <- t(test_x_2)
dim(train_x_img)
#> [1] 784 27455
dim(test_x_img)
#> [1] 784 7172
Fold the first dimension into a square of 28x28. I also add one more
value on the last as information of grayscale on the matrix
so it can be received by Model CNN. CNN only receive Convolution2D Input
Shape (n_samples, height, width, channels)
#fold the 1D, add one dimension more with value = 1
dim(train_x_img) <- c(28,28, 27455, 1)
dim(test_x_img) <- c(28,28, 7172, 1)
dim(train_x_img)
#> [1] 28 28 27455 1
dim(test_x_img)
#> [1] 28 28 7172 1
Switch the position of Matrix Array so it can be compatible with CNN Input Shape.
train_x_img <- aperm(train_x_img, c(3, 1,2, 4))
test_x_img <- aperm(test_x_img, c(3, 1,2, 4))
dim(train_x_img)
#> [1] 27455 28 28 1
dim(test_x_img)
#> [1] 7172 28 28 1
Set the target_size and batch_size so it
can be uniform along the codes without manually input the number. For
easier tweaking of hyper-parameter too.
# Desired height and width of images
target_size <- c(28, 28)
# Batch size for training the model
batch_size <- 128
Further, we are using image data generator for feeding the images to the model learning, and further enrich our dataset using Image Augmentation and improve the model performance.
Image Augmentation
The image_data_generator() function allows you to
specify a set of image processing and augmentation techniques to be
applied to the input images. This can include techniques such as
resizing, cropping, and normalization, as well as more advanced
techniques such as horizontal flipping, rotation, and color
shifting.
By using it, you can easily create a data generator that can be used to feed images to your model in small batches during training. The generator will apply the specified image processing and augmentation techniques to each batch of images on the fly, allowing you to train your model with a virtually infinite amount of augmented data.
Overall, the image_data_generator() function is a
powerful tool for preparing image data for training deep learning
models, and can greatly improve the performance and generalization of
your model.
# Image Generator
train_data_gen <- image_data_generator(featurewise_center = FALSE,
samplewise_center = FALSE,
featurewise_std_normalization = FALSE,
samplewise_std_normalization = FALSE,
zca_whitening = FALSE,
rotation_range = 10,
zoom_range = 0.1,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=FALSE,
vertical_flip=FALSE,
rescale=1/255)
test_data_gen <- image_data_generator(featurewise_center = FALSE,
samplewise_center = FALSE,
featurewise_std_normalization = FALSE,
samplewise_std_normalization = FALSE,
zca_whitening = FALSE,
rotation_range = 10,
zoom_range = 0.1,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=FALSE,
vertical_flip=FALSE,
rescale=1/255)
Create Image Array Generator for processing matrix array of pixels into square of image that can be read by the Model.
# Training Dataset
train_image_array_gen <- flow_images_from_data(train_x_img,
y = train_y_2,
seed = 123,
#save_to_dir = "data-output/training", #only use it for testing purpose, see if the image is resulted right
generator = train_data_gen)
# Validation Dataset
val_image_array_gen <- flow_images_from_data(test_x_img,
y = test_y_2,
batch_size = batch_size,
seed = 123,
#save_to_dir = "data-output/validation",
generator = test_data_gen)
Get the number of samples for both training and validation data.
# Number of training samples
train_samples <- train_image_array_gen$n
# Number of validation samples
valid_samples <- val_image_array_gen$n
We make only one model for the CNN method since it takes too long to knit. This model is a result of tweaking hyperparameters and re-learning the steps and architecture. I am also inspired by many already published models from others for putting parameters and layers in this architecture.
tensorflow::tf$random$set_seed(123)
model_CNN <- keras_model_sequential() %>%
# First convolutional layer
layer_conv_2d(filters = 75, #number filter is usually started big and then reduced accordingly layer after layer
kernel_size = c(3,3), # 6 x 6 filters
strides = 1,
padding = "same",
activation = "relu", #the same with 1st method, because the data is images
input_shape = c(target_size, 1)
) %>%
# Layer Normalization
layer_batch_normalization() %>% #normalization the result from 1st convolutional layer
# Max pooling layer
layer_max_pooling_2d(pool_size = c(2,2), #pooling with 2x2 size frame
strides = 2,
padding = "same") %>%
# Second convolutional layer
layer_conv_2d(filters = 50,
kernel_size = c(3,3), # 6 x 6 filters
strides = 1,
padding = "same",
activation = "relu"
) %>%
# Layer Drop Out
layer_dropout(rate = 0.2) %>%
# a dropout rate of 0.2 means that 20% of the activations will be set to zero during each training iteration
# Layer Normalization
layer_batch_normalization() %>%
# Max pooling layer
layer_max_pooling_2d(pool_size = c(2,2),
strides = 2,
padding = "same") %>%
# Third convolutional layer
layer_conv_2d(filters = 25,
kernel_size = c(3,3), # 6 x 6 filters
strides = 1,
padding = "same",
activation = "relu"
) %>%
# Layer Normalization
layer_batch_normalization() %>%
# Max pooling layer
layer_max_pooling_2d(pool_size = c(2,2),
strides = 2,
padding = "same") %>%
# Flattening layer
layer_flatten() %>%
# Dense layer
layer_dense(units = 512,
activation = "relu") %>%
# Layer Drop Out
layer_dropout(rate = 0.3) %>%
# Output layer
layer_dense(name = "Output",
units = 24,
activation = "softmax") #multiclass case
model_CNN
#> Model: "sequential"
#> _____________________________________________________________________
#> Layer (type) Output Shape Param #
#> =====================================================================
#> conv2d_2 (Conv2D) (None, 28, 28, 75) 750
#> _____________________________________________________________________
#> batch_normalization_2 (BatchNo (None, 28, 28, 75) 300
#> _____________________________________________________________________
#> max_pooling2d_2 (MaxPooling2D) (None, 14, 14, 75) 0
#> _____________________________________________________________________
#> conv2d_1 (Conv2D) (None, 14, 14, 50) 33800
#> _____________________________________________________________________
#> dropout_1 (Dropout) (None, 14, 14, 50) 0
#> _____________________________________________________________________
#> batch_normalization_1 (BatchNo (None, 14, 14, 50) 200
#> _____________________________________________________________________
#> max_pooling2d_1 (MaxPooling2D) (None, 7, 7, 50) 0
#> _____________________________________________________________________
#> conv2d (Conv2D) (None, 7, 7, 25) 11275
#> _____________________________________________________________________
#> batch_normalization (BatchNorm (None, 7, 7, 25) 100
#> _____________________________________________________________________
#> max_pooling2d (MaxPooling2D) (None, 4, 4, 25) 0
#> _____________________________________________________________________
#> flatten (Flatten) (None, 400) 0
#> _____________________________________________________________________
#> dense (Dense) (None, 512) 205312
#> _____________________________________________________________________
#> dropout (Dropout) (None, 512) 0
#> _____________________________________________________________________
#> Output (Dense) (None, 24) 12312
#> =====================================================================
#> Total params: 264,049
#> Trainable params: 263,749
#> Non-trainable params: 300
#> _____________________________________________________________________
sparse_categorical_crossentropy is more efficient to
compute and is preferred when the number of classes is large, as it does
not require the one-hot encoding of the true output data.
categorical_crossentropy is preferred when the number of
classes is small and the one-hot encoding is more practical.
model_CNN %>%
compile(
loss = "sparse_categorical_crossentropy",
optimizer = optimizer_adam(lr = 0.0001),
metrics = "accuracy"
)
learning_rate_reduction <- callback_reduce_lr_on_plateau(
monitor = "val_accuracy",
patience = 2,
verbose = 1,
factor = 0.5,
min_lr = 0.00001
)
history4 <- model_CNN %>%
fit_generator(
# training data
train_image_array_gen,
# epochs
steps_per_epoch = as.integer(train_samples / batch_size),
epochs = 30,
# validation data
validation_data = val_image_array_gen,
validation_steps = as.integer(valid_samples / batch_size),
# print progress but don't create graphic
verbose = 1#,
#callbacks = list(learning_rate_reduction)
)
hist4 <- plot(history4)
ggsave("data-output/plot/hist4.png", hist4)
history4
Insight :
First we load the model, I save it because the time to compile and fitting the model is too long for me to wait.
# load model
model_CNN <- load_model_tf("data-output/model_CNN")
model_1 <- load_model_tf("data-output/model_NN1")
model_2 <- load_model_tf("data-output/model_NN2")
Create prediction value on Test Dataset.
#predict for different model
pred_test_cnn <- predict_classes(model_CNN, test_x_img/255)
pred_test_nn1 <- predict_classes(model_1, test_x)
pred_test_nn2 <- predict_classes(model_2, test_x)
Convert actual label into array so it can be processed by
ConfusionMatrix.
actual_test <- as.array(test_hand$label)
class(actual_test)
#> [1] "array"
Create decode function to convert encoding
0 ~ 23 into an alphabet represented by the
image.
# Convert encoding to label
decode <- function(x){
case_when(x == 0 ~ "A",
x == 1 ~ "B",
x == 2 ~ "C",
x == 3 ~ "D",
x == 4 ~ "E",
x == 5 ~ "F",
x == 6 ~ "G",
x == 7 ~ "H",
x == 8 ~ "I",
x == 9 ~ "K",
x == 10 ~ "L",
x == 11 ~ "M",
x == 12 ~ "N",
x == 13 ~ "O",
x == 14 ~ "P",
x == 15 ~ "Q",
x == 16 ~ "R",
x == 17 ~ "S",
x == 18 ~ "T",
x == 19 ~ "U",
x == 20 ~ "V",
x == 21 ~ "W",
x == 22 ~ "X",
x == 23 ~ "Y"
)
}
#decode the result and actual label
pred_test_cnn <- sapply(pred_test_cnn, decode)
pred_test_nn1 <- sapply(pred_test_nn1, decode)
pred_test_nn2 <- sapply(pred_test_nn2, decode)
actual_test <- sapply(actual_test, decode)
confusionMatrix(as.factor(pred_test_cnn),
as.factor(actual_test)
)
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction A B C D E F G H I K L M N O P Q R
#> A 331 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> B 0 432 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> C 0 0 310 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> D 0 0 0 245 0 0 0 0 0 0 0 0 0 0 0 0 0
#> E 0 0 0 0 498 0 0 0 0 0 0 0 0 0 0 0 0
#> F 0 0 0 0 0 247 0 0 0 0 0 0 0 0 0 0 0
#> G 0 0 0 0 0 0 348 0 0 0 0 0 0 0 0 0 0
#> H 0 0 0 0 0 0 0 436 0 0 0 0 0 0 0 0 0
#> I 0 0 0 0 0 0 0 0 288 0 0 0 0 0 0 0 0
#> K 0 0 0 0 0 0 0 0 0 331 0 0 0 0 0 0 0
#> L 0 0 0 0 0 0 0 0 0 0 209 0 0 0 0 0 0
#> M 0 0 0 0 0 0 0 0 0 0 0 394 0 0 0 0 0
#> N 0 0 0 0 0 0 0 0 0 0 0 0 291 0 0 0 0
#> O 0 0 0 0 0 0 0 0 0 0 0 0 0 246 0 0 0
#> P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 347 0 0
#> Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 164 0
#> R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 144
#> S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> U 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> Reference
#> Prediction S T U V W X Y
#> A 0 0 0 0 0 0 0
#> B 0 0 0 0 0 0 0
#> C 0 0 0 0 0 0 0
#> D 0 0 0 0 0 0 0
#> E 9 0 0 0 0 0 0
#> F 0 0 0 0 0 0 0
#> G 0 0 0 0 0 0 0
#> H 0 1 0 0 0 0 0
#> I 0 0 0 0 0 0 0
#> K 0 0 0 0 0 0 0
#> L 0 0 0 0 0 0 0
#> M 0 0 0 0 0 0 0
#> N 0 0 0 0 0 0 0
#> O 0 0 0 0 0 0 0
#> P 0 0 0 0 0 0 0
#> Q 0 0 0 0 0 0 0
#> R 0 0 0 0 0 0 0
#> S 237 0 0 0 0 0 0
#> T 0 247 0 0 0 0 0
#> U 0 0 266 0 0 0 0
#> V 0 0 0 346 0 0 20
#> W 0 0 0 0 206 0 0
#> X 0 0 0 0 0 267 0
#> Y 0 0 0 0 0 0 312
#>
#> Overall Statistics
#>
#> Accuracy : 0.9958
#> 95% CI : (0.994, 0.9972)
#> No Information Rate : 0.0694
#> P-Value [Acc > NIR] : < 2.2e-16
#>
#> Kappa : 0.9956
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: A Class: B Class: C Class: D Class: E Class: F
#> Sensitivity 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
#> Specificity 1.00000 1.00000 1.00000 1.00000 0.99865 1.00000
#> Pos Pred Value 1.00000 1.00000 1.00000 1.00000 0.98225 1.00000
#> Neg Pred Value 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
#> Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
#> Detection Rate 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
#> Detection Prevalence 0.04615 0.06023 0.04322 0.03416 0.07069 0.03444
#> Balanced Accuracy 1.00000 1.00000 1.00000 1.00000 0.99933 1.00000
#> Class: G Class: H Class: I Class: K Class: L Class: M
#> Sensitivity 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
#> Specificity 1.00000 0.99985 1.00000 1.00000 1.00000 1.00000
#> Pos Pred Value 1.00000 0.99771 1.00000 1.00000 1.00000 1.00000
#> Neg Pred Value 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
#> Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
#> Detection Rate 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
#> Detection Prevalence 0.04852 0.06093 0.04016 0.04615 0.02914 0.05494
#> Balanced Accuracy 1.00000 0.99993 1.00000 1.00000 1.00000 1.00000
#> Class: N Class: O Class: P Class: Q Class: R Class: S
#> Sensitivity 1.00000 1.0000 1.00000 1.00000 1.00000 0.96341
#> Specificity 1.00000 1.0000 1.00000 1.00000 1.00000 1.00000
#> Pos Pred Value 1.00000 1.0000 1.00000 1.00000 1.00000 1.00000
#> Neg Pred Value 1.00000 1.0000 1.00000 1.00000 1.00000 0.99870
#> Prevalence 0.04057 0.0343 0.04838 0.02287 0.02008 0.03430
#> Detection Rate 0.04057 0.0343 0.04838 0.02287 0.02008 0.03305
#> Detection Prevalence 0.04057 0.0343 0.04838 0.02287 0.02008 0.03305
#> Balanced Accuracy 1.00000 1.0000 1.00000 1.00000 1.00000 0.98171
#> Class: T Class: U Class: V Class: W Class: X Class: Y
#> Sensitivity 0.99597 1.00000 1.00000 1.00000 1.00000 0.93976
#> Specificity 1.00000 1.00000 0.99707 1.00000 1.00000 1.00000
#> Pos Pred Value 1.00000 1.00000 0.94536 1.00000 1.00000 1.00000
#> Neg Pred Value 0.99986 1.00000 1.00000 1.00000 1.00000 0.99708
#> Prevalence 0.03458 0.03709 0.04824 0.02872 0.03723 0.04629
#> Detection Rate 0.03444 0.03709 0.04824 0.02872 0.03723 0.04350
#> Detection Prevalence 0.03444 0.03709 0.05103 0.02872 0.03723 0.04350
#> Balanced Accuracy 0.99798 1.00000 0.99854 1.00000 1.00000 0.96988
confusionMatrix(as.factor(pred_test_nn1),
as.factor(actual_test)
)
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction A B C D E F G H I K L M N O P Q R
#> A 308 0 0 0 0 0 0 0 5 0 0 0 45 0 0 0 0
#> B 0 387 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0
#> C 0 0 268 0 0 2 0 5 0 0 18 0 0 22 0 0 0
#> D 0 17 0 209 0 0 0 0 0 0 0 0 13 0 0 0 0
#> E 0 0 0 0 434 0 0 21 0 0 0 81 32 19 0 0 0
#> F 0 0 21 0 0 208 0 0 20 63 0 0 0 22 0 19 0
#> G 0 0 0 0 0 3 243 35 0 0 21 0 0 0 0 0 0
#> H 0 0 0 0 0 0 39 375 0 0 0 0 0 0 0 0 0
#> I 1 0 0 0 0 0 1 0 203 0 0 2 0 0 9 0 0
#> K 0 0 0 0 0 4 0 0 0 105 0 0 0 0 0 0 0
#> L 0 0 0 0 0 0 0 0 0 0 157 0 0 0 0 0 0
#> M 0 0 0 0 1 0 2 0 0 20 0 190 36 0 0 1 0
#> N 0 0 0 2 0 0 0 0 0 0 0 21 131 0 0 0 0
#> O 22 0 21 0 0 0 0 0 0 0 0 5 5 144 0 0 0
#> P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 328 0 0
#> Q 0 0 0 0 0 0 41 0 10 0 0 3 5 12 10 126 0
#> R 0 22 0 0 0 0 0 0 24 42 0 0 0 0 0 0 58
#> S 0 0 0 0 63 0 0 0 0 21 0 92 5 0 0 18 2
#> T 0 0 0 0 0 19 22 0 11 0 0 0 16 27 0 0 0
#> U 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 55
#> V 0 0 0 4 0 3 0 0 4 3 0 0 0 0 0 0 23
#> W 0 6 0 0 0 8 0 0 1 58 6 0 0 0 0 0 6
#> X 0 0 0 20 0 0 0 0 0 0 7 0 3 0 0 0 0
#> Y 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0
#> Reference
#> Prediction S T U V W X Y
#> A 0 0 0 0 0 0 0
#> B 0 0 0 0 0 0 18
#> C 0 0 0 0 0 0 0
#> D 0 0 18 0 0 0 0
#> E 24 0 0 0 0 0 0
#> F 0 0 0 20 0 0 0
#> G 0 0 0 0 0 0 0
#> H 20 0 0 2 0 0 0
#> I 41 21 0 0 0 1 24
#> K 0 0 21 1 0 0 0
#> L 0 0 0 0 0 0 41
#> M 41 0 0 0 0 0 0
#> N 21 0 0 0 0 0 4
#> O 0 0 0 0 0 0 0
#> P 0 0 0 19 0 5 0
#> Q 0 0 0 1 0 0 0
#> R 0 0 0 0 6 0 0
#> S 99 0 0 0 0 16 20
#> T 0 164 0 23 20 21 21
#> U 0 0 174 1 20 0 0
#> V 0 2 32 162 10 0 40
#> W 0 0 21 117 150 51 0
#> X 0 61 0 0 0 173 0
#> Y 0 0 0 0 0 0 164
#>
#> Overall Statistics
#>
#> Accuracy : 0.6916
#> 95% CI : (0.6807, 0.7023)
#> No Information Rate : 0.0694
#> P-Value [Acc > NIR] : < 2.2e-16
#>
#> Kappa : 0.6773
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: A Class: B Class: C Class: D Class: E Class: F
#> Sensitivity 0.93051 0.89583 0.86452 0.85306 0.87149 0.84211
#> Specificity 0.99269 0.99585 0.99315 0.99307 0.97348 0.97617
#> Pos Pred Value 0.86034 0.93253 0.85079 0.81323 0.71031 0.55764
#> Neg Pred Value 0.99662 0.99334 0.99387 0.99479 0.99025 0.99426
#> Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
#> Detection Rate 0.04294 0.05396 0.03737 0.02914 0.06051 0.02900
#> Detection Prevalence 0.04992 0.05786 0.04392 0.03583 0.08519 0.05201
#> Balanced Accuracy 0.96160 0.94584 0.92883 0.92307 0.92248 0.90914
#> Class: G Class: H Class: I Class: K Class: L Class: M
#> Sensitivity 0.69828 0.86009 0.70486 0.31722 0.75120 0.48223
#> Specificity 0.99135 0.99094 0.98547 0.99620 0.99411 0.98510
#> Pos Pred Value 0.80464 0.86009 0.66997 0.80153 0.79293 0.65292
#> Neg Pred Value 0.98472 0.99094 0.98763 0.96790 0.99254 0.97035
#> Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
#> Detection Rate 0.03388 0.05229 0.02830 0.01464 0.02189 0.02649
#> Detection Prevalence 0.04211 0.06079 0.04225 0.01827 0.02761 0.04057
#> Balanced Accuracy 0.84481 0.92552 0.84517 0.65671 0.87265 0.73367
#> Class: N Class: O Class: P Class: Q Class: R Class: S
#> Sensitivity 0.45017 0.58537 0.94524 0.76829 0.402778 0.40244
#> Specificity 0.99302 0.99235 0.99648 0.98830 0.986625 0.96578
#> Pos Pred Value 0.73184 0.73096 0.93182 0.60577 0.381579 0.29464
#> Neg Pred Value 0.97712 0.98538 0.99721 0.99454 0.987749 0.97850
#> Prevalence 0.04057 0.03430 0.04838 0.02287 0.020078 0.03430
#> Detection Rate 0.01827 0.02008 0.04573 0.01757 0.008087 0.01380
#> Detection Prevalence 0.02496 0.02747 0.04908 0.02900 0.021194 0.04685
#> Balanced Accuracy 0.72160 0.78886 0.97086 0.87830 0.694701 0.68411
#> Class: T Class: U Class: V Class: W Class: X Class: Y
#> Sensitivity 0.66129 0.65414 0.46821 0.72816 0.64794 0.49398
#> Specificity 0.97400 0.98624 0.98227 0.96067 0.98682 0.99854
#> Pos Pred Value 0.47674 0.64684 0.57244 0.35377 0.65530 0.94253
#> Neg Pred Value 0.98770 0.98667 0.97329 0.99170 0.98639 0.97599
#> Prevalence 0.03458 0.03709 0.04824 0.02872 0.03723 0.04629
#> Detection Rate 0.02287 0.02426 0.02259 0.02091 0.02412 0.02287
#> Detection Prevalence 0.04796 0.03751 0.03946 0.05912 0.03681 0.02426
#> Balanced Accuracy 0.81765 0.82019 0.72524 0.84441 0.81738 0.74626
confusionMatrix(as.factor(pred_test_nn2),
as.factor(actual_test)
)
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction A B C D E F G H I K L M N O P Q R
#> A 331 0 0 0 0 0 0 0 39 0 0 18 63 0 0 0 0
#> B 0 368 0 8 0 1 0 0 2 0 0 0 0 0 0 0 0
#> C 0 0 289 0 0 20 0 0 0 0 12 0 0 28 0 0 0
#> D 0 37 0 204 0 0 0 0 0 0 0 0 0 0 0 0 0
#> E 0 0 0 0 473 0 0 0 0 0 0 39 0 21 0 0 0
#> F 0 0 21 0 0 225 0 0 0 5 27 0 0 14 5 0 0
#> G 0 0 0 0 0 0 238 20 0 0 0 0 0 0 0 0 0
#> H 0 0 0 0 0 0 42 396 0 0 0 0 0 20 0 0 0
#> I 0 0 0 0 0 0 0 0 199 35 0 2 0 0 2 0 0
#> K 0 0 0 0 0 0 0 0 2 210 0 0 0 0 0 0 0
#> L 0 0 0 0 0 0 0 0 0 0 148 0 0 0 0 0 19
#> M 0 0 0 0 0 0 0 0 0 0 0 262 39 0 0 2 0
#> N 0 0 0 0 0 0 0 20 0 0 0 21 124 0 0 21 0
#> O 0 0 0 0 0 0 2 0 0 0 0 0 42 162 0 0 0
#> P 0 0 0 0 0 0 3 0 0 8 0 0 0 0 320 0 0
#> Q 0 0 0 0 0 0 38 0 0 0 0 1 17 0 7 141 0
#> R 0 0 0 17 0 0 0 0 19 15 0 0 0 0 0 0 71
#> S 0 0 0 0 25 0 0 0 6 0 0 51 2 0 3 0 25
#> T 0 0 0 0 0 1 20 0 0 0 0 0 4 1 0 0 0
#> U 0 25 0 13 0 0 0 0 0 20 0 0 0 0 10 0 5
#> V 0 0 0 0 0 0 5 0 0 2 3 0 0 0 0 0 4
#> W 0 2 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0
#> X 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0
#> Y 0 0 0 0 0 0 0 0 21 22 19 0 0 0 0 0 20
#> Reference
#> Prediction S T U V W X Y
#> A 0 0 0 0 0 0 0
#> B 0 0 0 0 19 0 19
#> C 0 0 0 0 0 0 0
#> D 0 0 4 0 0 0 0
#> E 21 0 0 0 0 0 0
#> F 0 16 0 20 0 2 0
#> G 0 3 0 0 0 0 0
#> H 20 8 0 0 0 0 0
#> I 21 19 6 0 21 0 43
#> K 0 0 21 20 0 0 37
#> L 0 2 0 0 0 8 18
#> M 64 0 0 0 0 0 0
#> N 16 0 0 0 0 0 16
#> O 0 0 0 0 0 0 0
#> P 0 0 0 20 0 0 0
#> Q 0 0 0 0 0 0 0
#> R 0 1 14 0 7 19 24
#> S 104 0 0 0 0 42 5
#> T 0 155 0 0 0 19 21
#> U 0 0 160 19 0 0 0
#> V 0 2 20 224 5 0 0
#> W 0 2 0 29 154 19 0
#> X 0 40 0 0 0 158 0
#> Y 0 0 41 14 0 0 149
#>
#> Overall Statistics
#>
#> Accuracy : 0.7341
#> 95% CI : (0.7237, 0.7443)
#> No Information Rate : 0.0694
#> P-Value [Acc > NIR] : < 2.2e-16
#>
#> Kappa : 0.7216
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: A Class: B Class: C Class: D Class: E Class: F
#> Sensitivity 1.00000 0.85185 0.93226 0.83265 0.94980 0.91093
#> Specificity 0.98246 0.99273 0.99126 0.99408 0.98786 0.98412
#> Pos Pred Value 0.73392 0.88249 0.82808 0.83265 0.85379 0.67164
#> Neg Pred Value 1.00000 0.99053 0.99692 0.99408 0.99622 0.99678
#> Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
#> Detection Rate 0.04615 0.05131 0.04030 0.02844 0.06595 0.03137
#> Detection Prevalence 0.06288 0.05814 0.04866 0.03416 0.07724 0.04671
#> Balanced Accuracy 0.99123 0.92229 0.96176 0.91337 0.96883 0.94752
#> Class: G Class: H Class: I Class: K Class: L Class: M
#> Sensitivity 0.68391 0.90826 0.69097 0.63444 0.70813 0.66497
#> Specificity 0.99663 0.98664 0.97836 0.98831 0.99325 0.98451
#> Pos Pred Value 0.91188 0.81481 0.57184 0.72414 0.75897 0.71390
#> Neg Pred Value 0.98408 0.99402 0.98696 0.98242 0.99126 0.98060
#> Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
#> Detection Rate 0.03318 0.05521 0.02775 0.02928 0.02064 0.03653
#> Detection Prevalence 0.03639 0.06776 0.04852 0.04044 0.02719 0.05117
#> Balanced Accuracy 0.84027 0.94745 0.83466 0.81137 0.85069 0.82474
#> Class: N Class: O Class: P Class: Q Class: R Class: S
#> Sensitivity 0.42612 0.65854 0.92219 0.85976 0.49306 0.42276
#> Specificity 0.98634 0.99365 0.99546 0.99101 0.98349 0.97704
#> Pos Pred Value 0.56881 0.78641 0.91168 0.69118 0.37968 0.39544
#> Neg Pred Value 0.97599 0.98794 0.99604 0.99670 0.98955 0.97945
#> Prevalence 0.04057 0.03430 0.04838 0.02287 0.02008 0.03430
#> Detection Rate 0.01729 0.02259 0.04462 0.01966 0.00990 0.01450
#> Detection Prevalence 0.03040 0.02872 0.04894 0.02844 0.02607 0.03667
#> Balanced Accuracy 0.70623 0.82609 0.95882 0.92538 0.73828 0.69990
#> Class: T Class: U Class: V Class: W Class: X Class: Y
#> Sensitivity 0.62500 0.60150 0.64740 0.74757 0.59176 0.44880
#> Specificity 0.99047 0.98668 0.99399 0.99053 0.99377 0.97997
#> Pos Pred Value 0.70136 0.63492 0.84528 0.70000 0.78607 0.52098
#> Neg Pred Value 0.98662 0.98468 0.98234 0.99252 0.98436 0.97342
#> Prevalence 0.03458 0.03709 0.04824 0.02872 0.03723 0.04629
#> Detection Rate 0.02161 0.02231 0.03123 0.02147 0.02203 0.02078
#> Detection Prevalence 0.03081 0.03514 0.03695 0.03067 0.02803 0.03988
#> Balanced Accuracy 0.80773 0.79409 0.82070 0.86905 0.79277 0.71438
0 false prediction.We may be able to create very accurate model with very little error (0.0001% or smaller) if we provide the data with bigger and richer dataset.
For example : bigger size of image, sharper image, more uniform ambience light and illumination for the image.
More uniform dataset but could make the model not smart enough, so we have to take a look of what is the purpose of this model.
If we use the model for example for webcam or daily video/images of sign language taken place, we may want to teach the model with large variation of images.
#save the model so it can be load again
save_model_tf(model_CNN, filepath = "data-output/model_CNN")
save_model_tf(model_1, filepath = "data-output/model_NN1")
save_model_tf(model_2, filepath = "data-output/model_NN2")