Preparation

We will going to use neural network model to predict the fashionmnist dataset. Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes (0 - 9).

Read the Data

Since the dataset is quite big, we will use fread function to read the train and test dataset.

mnist.train <- fread("fashionmnist/train.csv")
mnist.test <- fread("fashionmnist/test.csv")

Simple Exploratory Data Analysis

Picture Size

dim(mnist.train)
## [1] 60000   785
dim(mnist.test)
## [1] 10000   785

The mnist.train dataset consists of 60,000 observations, while the mnist.test consists of 10,000 observations. Both of them have 784 pixels and 1 label. Therefore the size of the pictures are sqrt(784) which is 28 pixels in height x 28 pixels in width. Based on that information, let us set the variabel for rows and columnss.

img_rows <- 28
img_cols <- 28

Data Proportion

plot(table(mnist.train$label), main = "Proportion of Fashion MNIST Train")

plot(table(mnist.test$label), main = "Proportion of Fashion MNIST Test")

The Data Proportion is equally distributed among 10 classes.

Data Transformation

Let us transform data to matrix (because they are easier to indexed) and pixels are separated from labels.

x_train <- as.matrix(mnist.train[, 2:dim(mnist.train)[2]])
y_train <- as.matrix(mnist.train[, 1])
dim(x_train) <- c(nrow(x_train), img_rows, img_cols, 1) # Unflattening the data

x_test <- as.matrix(mnist.test[, 2:dim(mnist.test)[2]])
y_test <- as.matrix(mnist.test[, 1])
dim(x_test) <- c(nrow(x_test), img_rows, img_cols, 1) # Unflattening the data

Category Labelling

clothes.labels <-c( "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
                    "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot")

Data Modelling

Set Up Tuning Variables

batch_size <- 128
num_classes <- 10
epochs <- 13

input_shape <- c(img_rows, img_cols, 1)

Data Scaling

x_train <- x_train / 255
x_test <- x_test / 255

Convert class vectors to binary class matrices

y_train <- to_categorical(y_train, num_classes)
y_test <- to_categorical(y_test, num_classes)

Model Building

Model Architecture - 1st Model

model1 <- keras_model_sequential()
model1 %>%
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
                input_shape = input_shape) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_dropout(rate = 0.25) %>%
  layer_flatten() %>%
  layer_dense(units = 64, activation = 'relu') %>%
  layer_dropout(rate = 0.5) %>%
  layer_dense(units = num_classes, activation = 'softmax')

summary(model1)
## Model: "sequential"
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d (Conv2D)                  (None, 26, 26, 32)            320         
## ___________________________________________________________________________
## max_pooling2d (MaxPooling2D)     (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## dropout (Dropout)                (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## flatten (Flatten)                (None, 5408)                  0           
## ___________________________________________________________________________
## dense (Dense)                    (None, 64)                    346176      
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 64)                    0           
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 10)                    650         
## ===========================================================================
## Total params: 347,146
## Trainable params: 347,146
## Non-trainable params: 0
## ___________________________________________________________________________

Model Compile - 1st Model

model1 %>% 
          compile(loss = "categorical_crossentropy", 
                  optimizer = optimizer_adam(lr = 0.001), 
                  metrics = c("accuracy"))

Model Training - 1st Model

set.seed(100)
model1 %>% 
          fit(x_train, 
              y_train, 
              epoch = epochs, 
              batch_size = batch_size)

Model Evaluation - 1st Model

scores <- model1 %>% evaluate(x_test, 
                              y_test, 
                              verbose = 0)


cat('Test loss:', scores[[1]], '\n')
## Test loss: 0.2378174
cat('Test accuracy:', scores[[2]], '\n')
## Test accuracy: 0.9138

For the first model, we get the Test Loss for 0.2412887 and the Test Accuracy is 0.9147. It’s a quite good model but we will try to tune it more.

Model Architecture - 2nd Model

To tune the model, we will add another layer_conv_2d with filters = 64 and kernel size c(3,3), and change the kernel size to c(5,5) on the first layer. We are going to double the size of units in the first layer_dense.

model2 <- keras_model_sequential()
model2 %>%
  layer_conv_2d(filters = 32, kernel_size = c(5,5), activation = 'relu',
                input_shape = input_shape) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_dropout(rate = 0.25) %>%
  layer_flatten() %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.5) %>%
  layer_dense(units = num_classes, activation = 'softmax')

summary(model2)
## Model: "sequential_1"
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_1 (Conv2D)                (None, 24, 24, 32)            832         
## ___________________________________________________________________________
## conv2d_2 (Conv2D)                (None, 22, 22, 64)            18496       
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)   (None, 11, 11, 64)            0           
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 11, 11, 64)            0           
## ___________________________________________________________________________
## flatten_1 (Flatten)              (None, 7744)                  0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 128)                   991360      
## ___________________________________________________________________________
## dropout_3 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_3 (Dense)                  (None, 10)                    1290        
## ===========================================================================
## Total params: 1,011,978
## Trainable params: 1,011,978
## Non-trainable params: 0
## ___________________________________________________________________________

Model Compile - 2nd Model

We are going to change the optimizer from optimizer_adam into optimizer_adadelta and change th learning rate into 0.05.

# compile model
model2 %>% compile(
  loss = "categorical_crossentropy",
  optimizer = optimizer_adadelta(),
  metrics = c('accuracy')
)

Model Training - 2nd Model

We will add verbose = 1 and validation data on 2nd model training.

set.seed(100)
model2 %>% fit(
  x_train, y_train,
  batch_size = batch_size,
  epochs = epochs,
  verbose = 1,
  validation_data = list(x_test, y_test)
)

Model Evaluation - 2nd Model

scores2 <- model2 %>% evaluate(
  x_test, y_test, verbose = 0
)
cat('Test loss:', scores2[[1]], '\n')
## Test loss: 0.2084355
cat('Test accuracy:', scores2[[2]], '\n')
## Test accuracy: 0.9251

As we can see, the test accuracy and loss is better than the 1st Model. Therfore, we will use this model to predict our images.

Predicting Images

Create Class Prediction

class_pred <- model2 %>% predict_classes(x_test)
table(class_pred)
## class_pred
##    0    1    2    3    4    5    6    7    8    9 
## 1034  996  865 1012 1069  985 1022  993  997 1027

Function to Create Images with Text

plotResults <- function(images) {
  x <- ceiling(sqrt(length(images)))
  par(mfrow = c(x,x), mar = c(.1,.1,.1,.1))
  
  for (i in images) {
    m <- matrix(x_test[i, , ,], nrow = 28, byrow = TRUE)
    m <- apply(m, 2, rev)
    image(t(m), col = grey.colors(255), axes = FALSE)
    
    predicted_label <- class_pred[i]
    false_label <- y_test[i]
    
    if(predicted_label == false_label) {
      color <- "red"
    } else {
      color <- "green"
    }
    
    text(0.5, 0.1, col = color, cex = 1.5, clothes.labels[predicted_label+1])
  }
}

Plotting the Result

plotResults(1:1000)

As we can see, out of 1000 images, only 7 that was identified as false image (0.7% False). This is indicating that our 2nd Model able to predict well.