Klasifikasi Data Fashion-MNIST dengan Neural Network

Penjelasan Data

Objektif

Data set ini merupakan data set mengenai pakaian dari artikel Zalando yang terdiri dari data train sebanyak 60 ribu dan data set sebanya 10 ribu. Data ini terdiri dari 10 kelas. Setiap data test dan data train memiliki penjelasan label sebagai berikut:

0 : T-shirt/top

1 : Trouser

2 : Pullover

3 : Dress

4 : Coat

5 : Sandal

6 : Shirt

7 : Sneaker

8 : Bag

9 : Ankle boot

Library untuk Neural Network

library(tidyverse)
library(keras)
library(dplyr)
library(rsample)
library(caret)

# checking tensorflow version 
tensorflow::tf_version()
## [1] '2.7'
# set seed
tensorflow::tf$random$set_seed(42)

theme_set(theme_minimal())

options(scipen = 999)

Membaca Data

fm_train <- read.csv("train.csv")
fm_test <-read.csv("test.csv")

dim(fm_train)
## [1] 60000   785

Melihat nama kolom pada data

colnames(fm_train)[c(1:5, 780:784)]
##  [1] "label"    "pixel1"   "pixel2"   "pixel3"   "pixel4"   "pixel779"
##  [7] "pixel780" "pixel781" "pixel782" "pixel783"

Kita dapat mengatur kolom label sebagai variabel respon dan kolom pixel 1 hingga pixel 783 merupakan variabel prediktor.

Visualisasi 25 data pertama

vizTrain <- function(input) {
    
    dimmax <- sqrt(ncol(fm_train[, -1]))
    
    dimn <- ceiling(sqrt(nrow(input)))
    par(mfrow = c(dimn, dimn), mar = c(0.1, 0.1, 0.1, 
        0.1))
    
    
    for (i in 1:nrow(input)) {
        m1 <- matrix(input[i, 2:ncol(input)], nrow = dimmax, 
            byrow = T)
        m1 <- apply(m1, 2, as.numeric)
        m1 <- t(apply(m1, 2, rev))
        
        image(1:dimmax, 1:dimmax, m1, col = grey.colors(255), 
            xaxt = "n", yaxt = "n")
        
        cat <- sapply(as.character(fm_train[i,1]), switch,
                           "0" = "T-shirt",
                           "1" = "Trouser", 
                           "2" = "Pullover", 
                           "3" = "Dress", 
                           "4" = "Coat",
                           "5" = "Sandal", 
                           "6" = "Shirt", 
                           "7" = "Sneaker", 
                           "8" = "Bag",
                           "9" = "Boot")
        
        text(2, 20, col = "white", cex = 1.2, cat)
    }
    
}
vizTrain(fm_train[1:25,])

Persiapan Data

Menyiapkan data untuk menggunakan model keras dan mengubahnya menjadi matriks

data_train <- data.matrix(fm_train)
data_test <- data.matrix(fm_test)

Data pada data_train

train_x <- data_train[,-1]/255 #-1 karena tidak termasuk kolom "label" pada kolom kesatu dan dibagi 255 untuk standarisasi ukuran gambar
train_y <- data_train[,1]

dim(train_x) #Cek dimensi
## [1] 60000   784

Data pada data_test

test_x <- data_test[,-1] /255
test_y <- data_test[,1]

dim(test_x)
## [1] 10000   784

Mengubah tipe data menjadi data array

train_x <- array_reshape(x = train_x,
                         dim = dim(train_x))
test_x <- array_reshape(x = test_x,
                         dim = dim(test_x))

Kita menjadikan variabel respon tiap data dengan 10 kategori (Sesuai dengan 10 kelas label)

train_y <- to_categorical(y = train_y,
                          num_classes = 10)
test_y <- to_categorical(y = test_y,
                          num_classes = 10)

Modelling

Model 1

Model pertama yang akan dibuat yaitu dengan 3 layer dan menggunakan optimizer sgd tanpa mengatur learning_rate.

model1<- keras_model_sequential() %>% 
  layer_dense(units = 128, input_shape = 784, activation = "relu") %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 10, activation = "softmax")

model1 %>%  compile(loss="categorical_crossentropy",
                 optimizer = optimizer_sgd(),
                 metrics="accuracy")

summary(model1)
## Model: "sequential"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  dense_2 (Dense)                    (None, 128)                     100480      
##  dense_1 (Dense)                    (None, 64)                      8256        
##  dense (Dense)                      (None, 10)                      650         
## ================================================================================
## Total params: 109,386
## Trainable params: 109,386
## Non-trainable params: 0
## ________________________________________________________________________________
history1 <- model1 %>% 
  fit(x = train_x, 
      y = train_y, 
      epochs = 30, 
      batch_size = 128)

history1
## 
## Final epoch (plot to see history):
##     loss: 0.3468
## accuracy: 0.8773
plot(history1)

pred1 <- predict(model1, test_x) %>% 
  k_argmax() %>% 
  as.array() %>% 
  as.factor()
confusionMatrix(data = pred1,
                reference = as.factor(fm_test[,1]))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 857   1  12  27   1   0 179   0   3   0
##          1   4 973   2  17   2   0   3   0   0   0
##          2  19   5 804  16  70   1 100   0   9   0
##          3  35  17  11 907  31   0  36   0   4   0
##          4   1   1 116  21 858   0  98   0   5   0
##          5   3   2   1   1   0 940   0  40   8  13
##          6  65   1  45   9  34   0 569   0   4   0
##          7   0   0   0   0   0  35   0 899   4  31
##          8  16   0   9   2   4   5  15   2 961   1
##          9   0   0   0   0   0  19   0  59   2 955
## 
## Overall Statistics
##                                                
##                Accuracy : 0.8723               
##                  95% CI : (0.8656, 0.8788)     
##     No Information Rate : 0.1                  
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.8581               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.8570   0.9730   0.8040   0.9070   0.8580   0.9400
## Specificity            0.9752   0.9969   0.9756   0.9851   0.9731   0.9924
## Pos Pred Value         0.7935   0.9720   0.7852   0.8713   0.7800   0.9325
## Neg Pred Value         0.9840   0.9970   0.9782   0.9896   0.9840   0.9933
## Prevalence             0.1000   0.1000   0.1000   0.1000   0.1000   0.1000
## Detection Rate         0.0857   0.0973   0.0804   0.0907   0.0858   0.0940
## Detection Prevalence   0.1080   0.1001   0.1024   0.1041   0.1100   0.1008
## Balanced Accuracy      0.9161   0.9849   0.8898   0.9461   0.9156   0.9662
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.5690   0.8990   0.9610   0.9550
## Specificity            0.9824   0.9922   0.9940   0.9911
## Pos Pred Value         0.7827   0.9278   0.9468   0.9227
## Neg Pred Value         0.9535   0.9888   0.9957   0.9950
## Prevalence             0.1000   0.1000   0.1000   0.1000
## Detection Rate         0.0569   0.0899   0.0961   0.0955
## Detection Prevalence   0.0727   0.0969   0.1015   0.1035
## Balanced Accuracy      0.7757   0.9456   0.9775   0.9731

Hasil: Diperoleh nilai akurasi pada model 1 yaitu 87.23%

Model 2

Kita dapat menambahkan hidden layer dan juga mencoba mengubah optimizer dengan optimizer adam dengan setting learning_rate tetap default.

model2<- keras_model_sequential() %>% 
  layer_dense(units = 256, input_shape = 784, activation = "relu") %>% 
  layer_dense(units = 128, activation = "relu") %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 10, activation = "softmax")

model2 %>%  compile(loss="categorical_crossentropy",
                 optimizer = optimizer_adam(),
                 metrics="accuracy")

summary(model2)
## Model: "sequential_1"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  dense_6 (Dense)                    (None, 256)                     200960      
##  dense_5 (Dense)                    (None, 128)                     32896       
##  dense_4 (Dense)                    (None, 64)                      8256        
##  dense_3 (Dense)                    (None, 10)                      650         
## ================================================================================
## Total params: 242,762
## Trainable params: 242,762
## Non-trainable params: 0
## ________________________________________________________________________________
history2 <-  model2 %>% 
  fit(x = train_x, 
      y = train_y, 
      epochs = 30, 
      batch_size = 128)

history2
## 
## Final epoch (plot to see history):
##     loss: 0.1211
## accuracy: 0.9531
plot(history2)

pred2 <- predict(model2, test_x) %>% 
  k_argmax() %>% 
  as.array() %>% 
  as.factor()
confusionMatrix(data = pred2,
                reference = as.factor(fm_test[,1]))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 832   1  11  16   1   0 108   0   2   1
##          1   2 989   0   8   0   0   2   0   0   0
##          2  14   1 804  11  49   0  75   0   2   0
##          3  20   8   9 907  22   1  21   0   2   0
##          4   3   0 105  29 869   0  70   0   4   0
##          5   1   1   0   1   0 954   0  14   0   5
##          6 119   0  67  26  57   0 719   0  11   0
##          7   0   0   0   0   0  26   0 940   3  21
##          8   9   0   4   2   2   2   5   1 975   0
##          9   0   0   0   0   0  17   0  45   1 973
## 
## Overall Statistics
##                                                
##                Accuracy : 0.8962               
##                  95% CI : (0.8901, 0.9021)     
##     No Information Rate : 0.1                  
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.8847               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.8320   0.9890   0.8040   0.9070   0.8690   0.9540
## Specificity            0.9844   0.9987   0.9831   0.9908   0.9766   0.9976
## Pos Pred Value         0.8560   0.9880   0.8410   0.9162   0.8046   0.9775
## Neg Pred Value         0.9814   0.9988   0.9783   0.9897   0.9853   0.9949
## Prevalence             0.1000   0.1000   0.1000   0.1000   0.1000   0.1000
## Detection Rate         0.0832   0.0989   0.0804   0.0907   0.0869   0.0954
## Detection Prevalence   0.0972   0.1001   0.0956   0.0990   0.1080   0.0976
## Balanced Accuracy      0.9082   0.9938   0.8936   0.9489   0.9228   0.9758
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.7190   0.9400   0.9750   0.9730
## Specificity            0.9689   0.9944   0.9972   0.9930
## Pos Pred Value         0.7197   0.9495   0.9750   0.9392
## Neg Pred Value         0.9688   0.9933   0.9972   0.9970
## Prevalence             0.1000   0.1000   0.1000   0.1000
## Detection Rate         0.0719   0.0940   0.0975   0.0973
## Detection Prevalence   0.0999   0.0990   0.1000   0.1036
## Balanced Accuracy      0.8439   0.9672   0.9861   0.9830

Hasil: Diperoleh akurasi pada model 2 yaitu 89.55%

Kesimpulan

Berdasarkan hasil analisis di atas, kita dapat mengetahui bahwa model 2 lebih baik dibandingkan model 1 karena nilai akurasi model pada fitting model dengan 30 epoch yaitu 95.31% dengan loss yang sangat kecil. Lalu, kemampuan prediksi model 2 juga lebih baik dengan akurasi mencapai 89.55%. Sehingga, kemungkinan penambahan layer dan perubahan optimizer berpengaruh dalam akurasi model.