1 Explanation

1.1 Brief

Artificial neural networks used in many fields such as image recognition and handwriting classification. In this NN, we will develop sophisticated models for classification task. From the number data, we will let the machine to learn the model and predict what the given picture. After that we will check the accuracy and try to improve our model.

1.2 Data’s Point of View

We have two datasets already, there are train & test. Data train will be used as the resource for the machine to learn and create the model. Otherwise, data test will be used to check whether our model has been sophisticated enough or not.

We will use the MNIST dataset (Modified National Institute of Standards and Technology database), which contains thousands of digitized images of various types of handwriting. We will create a Machine Learning model that can recognize the digits 0 to 9 by learning from the information provided.

2 Data Preparation

2.1 Load Library

Before we work with Neural Network, Let’s load the library needed:

library(data.table) #fread
library(keras) 
## Warning: package 'keras' was built under R version 4.2.3
library(caret)      #confusionMatrix
## Warning: package 'caret' was built under R version 4.2.3
## Loading required package: ggplot2
## Loading required package: lattice
library(ggplot2)
library(lattice)

2.2 Input Data

train_mnist <- read.csv("data_input/mnist/train.csv")
test_mnist <- read.csv("data_input/mnist/test.csv")

3 Exploratory Data Analysis

colnames(train_mnist)[c(1:5, 780:784)]
##  [1] "label"    "pixel0"   "pixel1"   "pixel2"   "pixel3"   "pixel778"
##  [7] "pixel779" "pixel780" "pixel781" "pixel782"
table(train_mnist$label)
## 
##    0    1    2    3    4    5    6    7    8    9 
## 4132 4684 4177 4351 4072 3795 4137 4401 4063 4188
barplot(table(train_mnist$label))

From the plot above, we know that the data has not been balanced, the observation already has slightly different amount for each label.

We would like to manually show the data:

m1 <- matrix(train_mnist[1,2:ncol(train_mnist)], nrow = 28, ncol = 28, byrow = T)
m1 <- apply(m1,2,as.numeric)
m1 <- apply(m1, 2, rev)
m1 <- t(m1)
image(m1)

Rather than show the image one by one, the vizTrain function will be used to show the few images & its label.

vizTrain <- function(input) {
    
    dimmax <- sqrt(ncol(train_mnist[, -1]))
    
    dimn <- ceiling(sqrt(nrow(input)))
    par(mfrow = c(dimn, dimn), mar = c(0.1, 0.1, 0.1, 
        0.1))
    
    
    for (i in 1:nrow(input)) {
        m1 <- matrix(input[i, 2:ncol(input)], nrow = dimmax, 
            byrow = T)
        m1 <- apply(m1, 2, as.numeric)
        m1 <- t(apply(m1, 2, rev))
        
        image(1:dimmax, 1:dimmax, m1, col = grey.colors(255), 
            xaxt = "n", yaxt = "n")
        
        cat <- sapply(as.character(train_mnist[i,1]), switch,
                           "0" = "0",
                           "1" = "1", 
                           "2" = "2", 
                           "3" = "3", 
                           "4" = "4",
                           "5" = "5", 
                           "6" = "6", 
                           "7" = "7", 
                           "8" = "8",
                           "9" = "9")
        
        text(2, 20, col = "white", cex = 1.2, cat)
    }
    
}

Show the picture from data train and give the label

vizTrain(train_mnist[1:25,])

We will prepare the data to create the model using Keras & MXNet. 1. Change the data into matrix

train <- data.matrix(train_mnist)
test <- data.matrix(test_mnist)

Separate the predictor (x) and target (y / label) in order to check, we also show the dimension of each matrixes.

train_x <- train[,-1]
train_y <- train[,1]
dim(train_x) 
## [1] 42000   784
test_x <- test[,]
test_y <- test[,1]
dim(test_x)
## [1] 28000   784

4 Keras

Keras is high level API for working with other deep learning frameworks. It could runs on top of multiple popular back-ends; e.g, TensorFlow, Theano, CNTK. In this course material, we will learning the basic workflow in using Keras with TensorFlow on the back-end for several deep learning tasks.

In Keras, we have to change the data matrix into array first.

Transform the data into array:

train_x_keras <- array_reshape(train_x,c(nrow(train_x), ncol(train_x)))
test_x_keras <- array_reshape(test_x, c(nrow(test_x), ncol(test_x)))

dim(test_x_keras)
## [1] 28000   784

We would like to standardize the picture

train_x_keras <- train_x_keras/255

test_x_keras <- test_x_keras /255

Change the label (y / target) into categorical. We have 10 label (0-9), so we defined 10 categorical:

train_y_keras <- to_categorical(train_y,10)

test_y_keras <- to_categorical(test_y,10)

4.1 1 st model

model <- keras_model_sequential()
model %>% layer_dense(units = 128, activation = "relu", input_shape = c(784)) %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 10, activation = "softmax")

model %>% compile(loss = "categorical_crossentropy", 
    optimizer = optimizer_sgd(), metrics = c("accuracy"))
summary(model)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense_2 (Dense)                     (None, 128)                     100480      
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 64)                      8256        
## ________________________________________________________________________________
## dense (Dense)                       (None, 10)                      650         
## ================================================================================
## Total params: 109,386
## Trainable params: 109,386
## Non-trainable params: 0
## ________________________________________________________________________________
#hist <-  model %>% fit(train_x_keras, train_y_keras, epochs = 30, batch_size = 128)
#saveRDS(hist, "history.RDS")
hist <- readRDS("history.RDS")
plot(hist)

hist$metrics$acc[30]
## [1] 0.9567619

From the model above we got 95.67% accuracy

4.2 2nd model

The accurary of the model has been good, but Let’s try to improve the accuracy by adding 1 hidden layer: Create the model1

model1 <- keras_model_sequential()
model1 %>% layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>% 
  layer_dense(units = 128, activation = "relu") %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 10, activation = "softmax")

model1 %>% compile(loss = "categorical_crossentropy", 
    optimizer = optimizer_sgd(), metrics = c("accuracy"))
summary(model1)
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense_6 (Dense)                     (None, 256)                     200960      
## ________________________________________________________________________________
## dense_5 (Dense)                     (None, 128)                     32896       
## ________________________________________________________________________________
## dense_4 (Dense)                     (None, 64)                      8256        
## ________________________________________________________________________________
## dense_3 (Dense)                     (None, 10)                      650         
## ================================================================================
## Total params: 242,762
## Trainable params: 242,762
## Non-trainable params: 0
## ________________________________________________________________________________
# history1 <-  model1 %>% fit(train_x_keras, train_y_keras, epochs = 30, batch_size = 128)
# saveRDS(history1, "history1.RDS")
history1 <- readRDS("history1.RDS")
plot(history1)

history1$metrics$acc[30]
## [1] 0.9714048

The improved model accurary is 97,14%

4.3 3rd model

Let’s try to change optimizer

model2 <- keras_model_sequential()
model2 %>% layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>% 
  layer_dense(units = 128, activation = "relu") %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 10, activation = "softmax")

model2 %>% compile(loss = "categorical_crossentropy", 
    optimizer = 'rmsprop', metrics = c("accuracy"))
summary(model2)
## Model: "sequential_2"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense_10 (Dense)                    (None, 256)                     200960      
## ________________________________________________________________________________
## dense_9 (Dense)                     (None, 128)                     32896       
## ________________________________________________________________________________
## dense_8 (Dense)                     (None, 64)                      8256        
## ________________________________________________________________________________
## dense_7 (Dense)                     (None, 10)                      650         
## ================================================================================
## Total params: 242,762
## Trainable params: 242,762
## Non-trainable params: 0
## ________________________________________________________________________________
# history2 <-  model2 %>% fit(train_x_keras, train_y_keras, epochs = 30, batch_size = 128)
# saveRDS(history2, "history2.RDS")
history2 <- readRDS("history2.RDS")
plot(history2)

history2$metrics$acc[30]
## [1] 0.9991667

After we used different optimizer = optimizer = ‘rmsprop’. The improved model accurary is 99.91%

4.4 Prediction

We will predict our data test:

prediction_test <- model %>% 
  predict(test_x_keras) %>% 
  k_argmax() %>% 
  as.array() %>% 
  as.factor()
 
prediction_test[1:10]
##  [1] 5 5 5 7 5 5 5 5 5 7
## Levels: 0 1 2 4 5 6 7 8
vizTest <- function(input) {
    
    dimmax <- sqrt(ncol(test_mnist[ ,-1]))
    
    dimn <- ceiling(sqrt(nrow(input)))
    par(mfrow = c(dimn, dimn), mar = c(0.1, 0.1, 0.1, 0.1))
    
    
    for (i in 1:nrow(input)) {
        m1 <- matrix(input[i, 2:ncol(input)], nrow = dimmax, byrow = T)
        m1 <- apply(m1, 2, as.numeric)
        m1 <- t(apply(m1, 2, rev))
        
        image(1:dimmax, 1:dimmax, m1, col = grey.colors(255), 
            xaxt = "n", yaxt = "n")
        
        cat <- sapply(as.character(test_mnist[i,785]), switch,
                           "0" = "0",
                           "1" = "1", 
                           "2" = "2", 
                           "3" = "3", 
                           "4" = "4",
                           "5" = "5", 
                           "6" = "6", 
                           "7" = "7", 
                           "8" = "8",
                           "9" = "9")
        
        text(2, 20, col = "white", cex = 1.2, cat)
    }
    
}
test_mnist$label <- prediction_test
vizTest(test_mnist[1:25,]) 

5 Conclusion

The 1st model is good and not overfitting. In model, we used 2 hidden layer (128,64 neuron) and 1 output layer (10 neuron). The accuracy for data train 95.67%, 97,14%, 99.91% respectively.

For testing model, it’s quite good to have only 1 error over 25 displayed images.