We will going to use neural network model to predict the fashionmnist dataset. Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes (0 - 9).
Since the dataset is quite big, we will use fread function to read the train and test dataset.
mnist.train <- fread("fashionmnist/train.csv")
mnist.test <- fread("fashionmnist/test.csv")
dim(mnist.train)
## [1] 60000 785
dim(mnist.test)
## [1] 10000 785
The mnist.train dataset consists of 60,000 observations, while the mnist.test consists of 10,000 observations. Both of them have 784 pixels and 1 label. Therefore the size of the pictures are sqrt(784) which is 28 pixels in height x 28 pixels in width. Based on that information, let us set the variabel for rows and columnss.
img_rows <- 28
img_cols <- 28
plot(table(mnist.train$label), main = "Proportion of Fashion MNIST Train")
plot(table(mnist.test$label), main = "Proportion of Fashion MNIST Test")
The Data Proportion is equally distributed among 10 classes.
Let us transform data to matrix (because they are easier to indexed) and pixels are separated from labels.
x_train <- as.matrix(mnist.train[, 2:dim(mnist.train)[2]])
y_train <- as.matrix(mnist.train[, 1])
dim(x_train) <- c(nrow(x_train), img_rows, img_cols, 1) # Unflattening the data
x_test <- as.matrix(mnist.test[, 2:dim(mnist.test)[2]])
y_test <- as.matrix(mnist.test[, 1])
dim(x_test) <- c(nrow(x_test), img_rows, img_cols, 1) # Unflattening the data
clothes.labels <-c( "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot")
batch_size <- 128
num_classes <- 10
epochs <- 13
input_shape <- c(img_rows, img_cols, 1)
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, num_classes)
y_test <- to_categorical(y_test, num_classes)
model1 <- keras_model_sequential()
model1 %>%
layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 64, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes, activation = 'softmax')
summary(model1)
## Model: "sequential"
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d (Conv2D) (None, 26, 26, 32) 320
## ___________________________________________________________________________
## max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
## ___________________________________________________________________________
## dropout (Dropout) (None, 13, 13, 32) 0
## ___________________________________________________________________________
## flatten (Flatten) (None, 5408) 0
## ___________________________________________________________________________
## dense (Dense) (None, 64) 346176
## ___________________________________________________________________________
## dropout_1 (Dropout) (None, 64) 0
## ___________________________________________________________________________
## dense_1 (Dense) (None, 10) 650
## ===========================================================================
## Total params: 347,146
## Trainable params: 347,146
## Non-trainable params: 0
## ___________________________________________________________________________
model1 %>%
compile(loss = "categorical_crossentropy",
optimizer = optimizer_adam(lr = 0.001),
metrics = c("accuracy"))
set.seed(100)
model1 %>%
fit(x_train,
y_train,
epoch = epochs,
batch_size = batch_size)
scores <- model1 %>% evaluate(x_test,
y_test,
verbose = 0)
cat('Test loss:', scores[[1]], '\n')
## Test loss: 0.2378174
cat('Test accuracy:', scores[[2]], '\n')
## Test accuracy: 0.9138
For the first model, we get the Test Loss for 0.2412887 and the Test Accuracy is 0.9147. It’s a quite good model but we will try to tune it more.
To tune the model, we will add another layer_conv_2d with filters = 64 and kernel size c(3,3), and change the kernel size to c(5,5) on the first layer. We are going to double the size of units in the first layer_dense.
model2 <- keras_model_sequential()
model2 %>%
layer_conv_2d(filters = 32, kernel_size = c(5,5), activation = 'relu',
input_shape = input_shape) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes, activation = 'softmax')
summary(model2)
## Model: "sequential_1"
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d_1 (Conv2D) (None, 24, 24, 32) 832
## ___________________________________________________________________________
## conv2d_2 (Conv2D) (None, 22, 22, 64) 18496
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D) (None, 11, 11, 64) 0
## ___________________________________________________________________________
## dropout_2 (Dropout) (None, 11, 11, 64) 0
## ___________________________________________________________________________
## flatten_1 (Flatten) (None, 7744) 0
## ___________________________________________________________________________
## dense_2 (Dense) (None, 128) 991360
## ___________________________________________________________________________
## dropout_3 (Dropout) (None, 128) 0
## ___________________________________________________________________________
## dense_3 (Dense) (None, 10) 1290
## ===========================================================================
## Total params: 1,011,978
## Trainable params: 1,011,978
## Non-trainable params: 0
## ___________________________________________________________________________
We are going to change the optimizer from optimizer_adam into optimizer_adadelta and change th learning rate into 0.05.
# compile model
model2 %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_adadelta(),
metrics = c('accuracy')
)
We will add verbose = 1 and validation data on 2nd model training.
set.seed(100)
model2 %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
validation_data = list(x_test, y_test)
)
scores2 <- model2 %>% evaluate(
x_test, y_test, verbose = 0
)
cat('Test loss:', scores2[[1]], '\n')
## Test loss: 0.2084355
cat('Test accuracy:', scores2[[2]], '\n')
## Test accuracy: 0.9251
As we can see, the test accuracy and loss is better than the 1st Model. Therfore, we will use this model to predict our images.
class_pred <- model2 %>% predict_classes(x_test)
table(class_pred)
## class_pred
## 0 1 2 3 4 5 6 7 8 9
## 1034 996 865 1012 1069 985 1022 993 997 1027
plotResults <- function(images) {
x <- ceiling(sqrt(length(images)))
par(mfrow = c(x,x), mar = c(.1,.1,.1,.1))
for (i in images) {
m <- matrix(x_test[i, , ,], nrow = 28, byrow = TRUE)
m <- apply(m, 2, rev)
image(t(m), col = grey.colors(255), axes = FALSE)
predicted_label <- class_pred[i]
false_label <- y_test[i]
if(predicted_label == false_label) {
color <- "red"
} else {
color <- "green"
}
text(0.5, 0.1, col = color, cex = 1.5, clothes.labels[predicted_label+1])
}
}
plotResults(1:1000)
As we can see, out of 1000 images, only 7 that was identified as false image (0.7% False). This is indicating that our 2nd Model able to predict well.