Artificial neural networks used in many fields such as image recognition and handwriting classification. In this NN, we will develop sophisticated models for classification task. From the number data, we will let the machine to learn the model and predict what the given picture. After that we will check the accuracy and try to improve our model.
We have two datasets already, there are train & test. Data train will be used as the resource for the machine to learn and create the model. Otherwise, data test will be used to check whether our model has been sophisticated enough or not.
We will use the MNIST dataset (Modified National Institute of Standards and Technology database), which contains thousands of digitized images of various types of handwriting. We will create a Machine Learning model that can recognize the digits 0 to 9 by learning from the information provided.
Before we work with Neural Network, Let’s load the library needed:
library(data.table) #fread
library(keras) ## Warning: package 'keras' was built under R version 4.2.3
library(caret) #confusionMatrix## Warning: package 'caret' was built under R version 4.2.3
## Loading required package: ggplot2
## Loading required package: lattice
library(ggplot2)
library(lattice)train_mnist <- read.csv("data_input/mnist/train.csv")
test_mnist <- read.csv("data_input/mnist/test.csv")colnames(train_mnist)[c(1:5, 780:784)]## [1] "label" "pixel0" "pixel1" "pixel2" "pixel3" "pixel778"
## [7] "pixel779" "pixel780" "pixel781" "pixel782"
table(train_mnist$label)##
## 0 1 2 3 4 5 6 7 8 9
## 4132 4684 4177 4351 4072 3795 4137 4401 4063 4188
barplot(table(train_mnist$label))
From the plot above, we know that the data has not been balanced, the
observation already has slightly different amount for each label.
We would like to manually show the data:
m1 <- matrix(train_mnist[1,2:ncol(train_mnist)], nrow = 28, ncol = 28, byrow = T)
m1 <- apply(m1,2,as.numeric)
m1 <- apply(m1, 2, rev)
m1 <- t(m1)
image(m1)Rather than show the image one by one, the vizTrain function will be used to show the few images & its label.
vizTrain <- function(input) {
dimmax <- sqrt(ncol(train_mnist[, -1]))
dimn <- ceiling(sqrt(nrow(input)))
par(mfrow = c(dimn, dimn), mar = c(0.1, 0.1, 0.1,
0.1))
for (i in 1:nrow(input)) {
m1 <- matrix(input[i, 2:ncol(input)], nrow = dimmax,
byrow = T)
m1 <- apply(m1, 2, as.numeric)
m1 <- t(apply(m1, 2, rev))
image(1:dimmax, 1:dimmax, m1, col = grey.colors(255),
xaxt = "n", yaxt = "n")
cat <- sapply(as.character(train_mnist[i,1]), switch,
"0" = "0",
"1" = "1",
"2" = "2",
"3" = "3",
"4" = "4",
"5" = "5",
"6" = "6",
"7" = "7",
"8" = "8",
"9" = "9")
text(2, 20, col = "white", cex = 1.2, cat)
}
}Show the picture from data train and give the label
vizTrain(train_mnist[1:25,])We will prepare the data to create the model using Keras & MXNet. 1. Change the data into matrix
train <- data.matrix(train_mnist)
test <- data.matrix(test_mnist)Separate the predictor (x) and target (y / label) in order to check, we also show the dimension of each matrixes.
train_x <- train[,-1]
train_y <- train[,1]
dim(train_x) ## [1] 42000 784
test_x <- test[,]
test_y <- test[,1]
dim(test_x)## [1] 28000 784
Keras is high level API for working with other deep learning frameworks. It could runs on top of multiple popular back-ends; e.g, TensorFlow, Theano, CNTK. In this course material, we will learning the basic workflow in using Keras with TensorFlow on the back-end for several deep learning tasks.
In Keras, we have to change the data matrix into array first.
Transform the data into array:
train_x_keras <- array_reshape(train_x,c(nrow(train_x), ncol(train_x)))
test_x_keras <- array_reshape(test_x, c(nrow(test_x), ncol(test_x)))
dim(test_x_keras)## [1] 28000 784
We would like to standardize the picture
train_x_keras <- train_x_keras/255
test_x_keras <- test_x_keras /255Change the label (y / target) into categorical. We have 10 label (0-9), so we defined 10 categorical:
train_y_keras <- to_categorical(train_y,10)
test_y_keras <- to_categorical(test_y,10)model <- keras_model_sequential()
model %>% layer_dense(units = 128, activation = "relu", input_shape = c(784)) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 10, activation = "softmax")
model %>% compile(loss = "categorical_crossentropy",
optimizer = optimizer_sgd(), metrics = c("accuracy"))summary(model)## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_2 (Dense) (None, 128) 100480
## ________________________________________________________________________________
## dense_1 (Dense) (None, 64) 8256
## ________________________________________________________________________________
## dense (Dense) (None, 10) 650
## ================================================================================
## Total params: 109,386
## Trainable params: 109,386
## Non-trainable params: 0
## ________________________________________________________________________________
#hist <- model %>% fit(train_x_keras, train_y_keras, epochs = 30, batch_size = 128)
#saveRDS(hist, "history.RDS")
hist <- readRDS("history.RDS")
plot(hist)hist$metrics$acc[30]## [1] 0.9567619
From the model above we got 95.67% accuracy
The accurary of the model has been good, but Let’s try to improve the accuracy by adding 1 hidden layer: Create the model1
model1 <- keras_model_sequential()
model1 %>% layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>%
layer_dense(units = 128, activation = "relu") %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 10, activation = "softmax")
model1 %>% compile(loss = "categorical_crossentropy",
optimizer = optimizer_sgd(), metrics = c("accuracy"))summary(model1)## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_6 (Dense) (None, 256) 200960
## ________________________________________________________________________________
## dense_5 (Dense) (None, 128) 32896
## ________________________________________________________________________________
## dense_4 (Dense) (None, 64) 8256
## ________________________________________________________________________________
## dense_3 (Dense) (None, 10) 650
## ================================================================================
## Total params: 242,762
## Trainable params: 242,762
## Non-trainable params: 0
## ________________________________________________________________________________
# history1 <- model1 %>% fit(train_x_keras, train_y_keras, epochs = 30, batch_size = 128)
# saveRDS(history1, "history1.RDS")
history1 <- readRDS("history1.RDS")
plot(history1)history1$metrics$acc[30]## [1] 0.9714048
The improved model accurary is 97,14%
Let’s try to change optimizer
model2 <- keras_model_sequential()
model2 %>% layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>%
layer_dense(units = 128, activation = "relu") %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 10, activation = "softmax")
model2 %>% compile(loss = "categorical_crossentropy",
optimizer = 'rmsprop', metrics = c("accuracy"))summary(model2)## Model: "sequential_2"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_10 (Dense) (None, 256) 200960
## ________________________________________________________________________________
## dense_9 (Dense) (None, 128) 32896
## ________________________________________________________________________________
## dense_8 (Dense) (None, 64) 8256
## ________________________________________________________________________________
## dense_7 (Dense) (None, 10) 650
## ================================================================================
## Total params: 242,762
## Trainable params: 242,762
## Non-trainable params: 0
## ________________________________________________________________________________
# history2 <- model2 %>% fit(train_x_keras, train_y_keras, epochs = 30, batch_size = 128)
# saveRDS(history2, "history2.RDS")
history2 <- readRDS("history2.RDS")
plot(history2)history2$metrics$acc[30]## [1] 0.9991667
After we used different optimizer = optimizer = ‘rmsprop’. The improved model accurary is 99.91%
We will predict our data test:
prediction_test <- model %>%
predict(test_x_keras) %>%
k_argmax() %>%
as.array() %>%
as.factor()
prediction_test[1:10]## [1] 5 5 5 7 5 5 5 5 5 7
## Levels: 0 1 2 4 5 6 7 8
vizTest <- function(input) {
dimmax <- sqrt(ncol(test_mnist[ ,-1]))
dimn <- ceiling(sqrt(nrow(input)))
par(mfrow = c(dimn, dimn), mar = c(0.1, 0.1, 0.1, 0.1))
for (i in 1:nrow(input)) {
m1 <- matrix(input[i, 2:ncol(input)], nrow = dimmax, byrow = T)
m1 <- apply(m1, 2, as.numeric)
m1 <- t(apply(m1, 2, rev))
image(1:dimmax, 1:dimmax, m1, col = grey.colors(255),
xaxt = "n", yaxt = "n")
cat <- sapply(as.character(test_mnist[i,785]), switch,
"0" = "0",
"1" = "1",
"2" = "2",
"3" = "3",
"4" = "4",
"5" = "5",
"6" = "6",
"7" = "7",
"8" = "8",
"9" = "9")
text(2, 20, col = "white", cex = 1.2, cat)
}
}test_mnist$label <- prediction_test
vizTest(test_mnist[1:25,]) The 1st model is good and not overfitting. In model, we used 2 hidden layer (128,64 neuron) and 1 output layer (10 neuron). The accuracy for data train 95.67%, 97,14%, 99.91% respectively.
For testing model, it’s quite good to have only 1 error over 25 displayed images.