CIFAR-10 Image Classification With CNN

Sunarto Rusli

3/28/2022

Convolutional Neural Network

Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.

CIFAR-10

CIFAR-10 Dataset as it suggests has 10 different categories of images in it. There is a total of 60000 images of 10 different classes naming Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck. Our goal is to build a CNN model, learn from the train dataset and make prediction on test dataset.

cifar10 <- dataset_cifar10()
## Loaded Tensorflow version 2.0.0
ctrain_x <- cifar10$train$x

ctrain_y <- to_categorical(cifar10$train$y, num_classes = 10)

ctest_x <- cifar10$test$x

ctest_y <- cifar10$test$y

Explanatory Data Analysis

All the images are of size 32×32. There are in total 50000 train images and 10000 test images. And these are colored images.

dim(ctrain_x)
## [1] 50000    32    32     3

Let us take a peek of some of the images.

train_sample <- sample(ctrain_x,20)

fig_img  = list()
for (i in 1:20 ) {
  fig_mat  = cifar10$train$x[train_sample[i], , , ]
  fig_img[[i]]  = normalize(Image(transpose(fig_mat), dim=c(32,32,3), colormode='Color'))
}
fig_img_comb = combine(fig_img[1:20])
fig_img_obj = tile(fig_img_comb,5)
plot(fig_img_obj, all=T)

Data Preprocessing

Class Distribution

All classes are in balance, with each class distribution at 10%

cifar10$train$y %>% 
  table() %>% 
  prop.table()
## .
##   0   1   2   3   4   5   6   7   8   9 
## 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

Set 30 images per batch.

batch_size <- 30

Augmentation

we will use Image Augmentation technic to increase the size of training set without acquiring new images. The goal is that to teach the model not only with the original image but also the modification of the image, such as flipping the image, rotate it, zooming, crop the image, etc. This will create more robust model.

On this dataset we will set the following augmentation : - Scaling the pixel value by dividing the pixel value by 255 - Flip the image horizontally - Flip the image vertically - Rotate the image from 0 to 45 degrees - Set brightess range from 0.3 to 0.9 - Zoom in or zoom out by 25% (zoom 75% or 125%)

image_gen <- image_data_generator(rescale = 1/255,
                                  horizontal_flip = T,
                                  vertical_flip = T,
                                  rotation_range = 45,
                                  brightness_range=c(0.3,1.2),
                                  zoom_range = 0.25
                                  )
train_image_array_gen <- flow_images_from_data(ctrain_x,
                                               ctrain_y,
                                               batch_size = batch_size,
                                               generator = image_gen)

Model Building with Tensor Flow

For our initial model, we will start with the following layer :

  • Convolutional layer to extract features from 2D image with relu activation function and 32 filters

  • Max Pooling layer to downsample the image features

  • Convolutional layer to extract features from 2D image with relu activation function and 64 filters

  • Max Pooling layer to downsample the image features

  • Flattening layer to flatten data from 2D array to 1D array

  • Dense layer to capture more information with relu activation function and 64 units

  • Dense layer for output with softmax activation function and 3 units (the target class)

# cmodel1 <- keras_model_sequential() %>% 
#   layer_conv_2d(filters = 32,
#                 kernel_size = c(3,3),
#                 padding = "same",
#                 activation = "relu",
#                 input_shape = c(32,32, 3)
#                 ) %>%
#   layer_max_pooling_2d(pool_size = c(2,2)) %>%
#   layer_conv_2d(filters = 64,
#                 kernel_size = c(3,3),
#                 padding = "same",
#                 activation = "relu",
#                 input_shape = c(32,32, 3)
#                 ) %>%
#   layer_max_pooling_2d(pool_size = c(2,2)) %>%
#   layer_flatten() %>%
#   layer_dense(units = 64,
#               activation = "relu") %>%
#   layer_dense(name = "Output",
#               units = 10,
#               activation = "softmax")

cmodel1 <- load_model_hdf5("cmodel1.hdf5")

cmodel1
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d_1 (Conv2D)                   (None, 32, 32, 32)              896         
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)      (None, 16, 16, 32)              0           
## ________________________________________________________________________________
## conv2d (Conv2D)                     (None, 16, 16, 64)              18496       
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D)        (None, 8, 8, 64)                0           
## ________________________________________________________________________________
## flatten (Flatten)                   (None, 4096)                    0           
## ________________________________________________________________________________
## dense (Dense)                       (None, 64)                      262208      
## ________________________________________________________________________________
## Output (Dense)                      (None, 10)                      650         
## ================================================================================
## Total params: 282,250
## Trainable params: 282,250
## Non-trainable params: 0
## ________________________________________________________________________________

Model Fitting

We will use categorical_crossentropy as the loss function, optimizer adam with learning rate 0.001, and accuracy metric.

# cmodel1 %>% 
#    compile(
#     loss = "categorical_crossentropy",
#     optimizer = optimizer_adam(learning_rate = 0.001),
#     metrics = "accuracy")

Fitting the model with epoch 20.

# chistory1 <- cmodel1 %>%
#   fit_generator(
#   train_image_array_gen,
#   steps_per_epoch = as.integer(50000 / batch_size),
#   epochs = 20
# )


chistory1 <- readRDS("chistory1.RDS")

plot(chistory1)
## `geom_smooth()` using formula 'y ~ x'

chistory1
## Trained on 1,666 samples (batch_size=NULL, epochs=20)
## Final epoch (plot to see history):
##     loss: 1.144
## accuracy: 0.6001

Our initial model accuracy is at 60%.

Model Evaluation

Evaluate the model by predicting the test set.

pred_test <- cmodel1 %>% 
  predict(ctest_x/255) %>% #scale the x
  k_argmax() %>% 
  as.array()

head(pred_test,10)
##  [1] 3 9 8 8 6 6 1 6 3 1

Encoding the test prediction into their classes.

decode <- function(x){
  case_when(x == 0 ~ "airplane",
            x == 1 ~ "automobile",
            x == 2 ~ "bird",
            x == 3 ~ "cat",
            x == 4 ~ "deer",
            x == 5 ~ "dog",
            x == 6 ~ "frog",
            x == 7 ~ "horse",
            x == 8 ~ "ship",
            x == 9 ~ "truck"
            )
}

pred_test <- sapply(pred_test, decode)

head(pred_test)
## [1] "cat"   "truck" "ship"  "ship"  "frog"  "frog"

Confusion Matrix shows 60.5% accuracy on validation set. No overfit, but this accuracy is very low. We will try to tune the model.

ctest_y <- sapply(cifar10$test$y,decode)


caret::confusionMatrix(as.factor(pred_test),
                       as.factor(ctest_y))
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   airplane automobile bird cat deer dog frog horse ship truck
##   airplane        616          6   80  21   24  18    5    18   82    12
##   automobile       26        653    9   7    0   8    8     5   42    59
##   bird             24          6  422  56   66  32   15    17   18     3
##   cat              11          7   62 327   23 151   40    47    9    10
##   deer             15          2  153  87  625  85   44   118   10     6
##   dog               7         12   39 140   10 439    7    39   12     9
##   frog             11         18  124 189  146  99  828    42   27    37
##   horse            35         13   53  74   69  87    8   650   24    18
##   ship            141         12   31  34   20  23   17    11  672    26
##   truck           114        271   27  65   17  58   28    53  104   820
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6052          
##                  95% CI : (0.5955, 0.6148)
##     No Information Rate : 0.1             
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5613          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: airplane Class: automobile Class: bird Class: cat
## Sensitivity                   0.6160            0.6530      0.4220     0.3270
## Specificity                   0.9704            0.9818      0.9737     0.9600
## Pos Pred Value                0.6984            0.7993      0.6404     0.4760
## Neg Pred Value                0.9579            0.9622      0.9381     0.9277
## Prevalence                    0.1000            0.1000      0.1000     0.1000
## Detection Rate                0.0616            0.0653      0.0422     0.0327
## Detection Prevalence          0.0882            0.0817      0.0659     0.0687
## Balanced Accuracy             0.7932            0.8174      0.6978     0.6435
##                      Class: deer Class: dog Class: frog Class: horse
## Sensitivity               0.6250     0.4390      0.8280       0.6500
## Specificity               0.9422     0.9694      0.9230       0.9577
## Pos Pred Value            0.5459     0.6148      0.5444       0.6305
## Neg Pred Value            0.9577     0.9396      0.9797       0.9610
## Prevalence                0.1000     0.1000      0.1000       0.1000
## Detection Rate            0.0625     0.0439      0.0828       0.0650
## Detection Prevalence      0.1145     0.0714      0.1521       0.1031
## Balanced Accuracy         0.7836     0.7042      0.8755       0.8038
##                      Class: ship Class: truck
## Sensitivity               0.6720       0.8200
## Specificity               0.9650       0.9181
## Pos Pred Value            0.6809       0.5267
## Neg Pred Value            0.9636       0.9787
## Prevalence                0.1000       0.1000
## Detection Rate            0.0672       0.0820
## Detection Prevalence      0.0987       0.1557
## Balanced Accuracy         0.8185       0.8691

Model Tuning

We will tune our model with the following layer :

  • Convolutional layer to extract features from 2D image with relu activation function and 32 filters

  • Convolutional layer to extract features from 2D image with relu activation function and 32 filters

  • Max Pooling layer to downsample the image features

  • Layer dropout to prevent overfitting

  • Convolutional layer to extract features from 2D image with relu activation function and 64 filters

  • Max Pooling layer to downsample the image features

  • Layer dropout to prevent overfitting

  • Convolutional layer to extract features from 2D image with relu activation function and 128 filters

  • Max Pooling layer to downsample the image features

  • Layer dropout to prevent overfitting

  • Flattening layer to flatten data from 2D array to 1D array

  • Dense layer to capture more information with relu activation function and 1024 units

  • Dense layer to capture more information with relu activation function and 512 units

  • Layer dropout to prevent overfitting

  • Dense layer for output with softmax activation function and 3 units (the target class)

# cmodel3 <- keras_model_sequential() %>% 
#   layer_conv_2d(filters = 32,
#                 kernel_size = c(3,3),
#                 padding = "same",
#                 activation = "relu",
#                 input_shape = c(32,32, 3)
#                 ) %>%
#   layer_conv_2d(filters = 32,
#                 kernel_size = c(3,3),
#                 padding = "same",
#                 activation = "relu",
#                 input_shape = c(32,32, 3)
#                 ) %>%  
#   layer_max_pooling_2d(pool_size = c(2,2)) %>%
#   layer_dropout(0.25) %>%
#   layer_conv_2d(filters = 64,
#                 kernel_size = c(3,3),
#                 padding = "same",
#                 activation = "relu",
#                 input_shape = c(32,32, 3)
#                 ) %>%
#   layer_max_pooling_2d(pool_size = c(2,2)) %>%
#   layer_dropout(0.25) %>%
#   layer_conv_2d(filters = 128,
#                 kernel_size = c(3,3),
#                 padding = "same",
#                 activation = "relu",
#                 input_shape = c(32,32, 3)
#                 ) %>%
#   layer_max_pooling_2d(pool_size = c(2,2)) %>%
#   layer_dropout(0.25) %>%
#   layer_flatten() %>%
#   layer_dense(units = 1024,
#               activation = "relu") %>%
#   layer_dense(units = 512,
#               activation = "relu") %>%
#   layer_dropout(0.25) %>%
#   layer_dense(name = "Output",
#               units = 10,
#               activation = "softmax")

cmodel3 <- load_model_hdf5("cmodel3.hdf5")

cmodel3
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d_7 (Conv2D)                   (None, 32, 32, 32)              896         
## ________________________________________________________________________________
## conv2d_6 (Conv2D)                   (None, 32, 32, 32)              9248        
## ________________________________________________________________________________
## max_pooling2d_5 (MaxPooling2D)      (None, 16, 16, 32)              0           
## ________________________________________________________________________________
## dropout_3 (Dropout)                 (None, 16, 16, 32)              0           
## ________________________________________________________________________________
## conv2d_5 (Conv2D)                   (None, 16, 16, 64)              18496       
## ________________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D)      (None, 8, 8, 64)                0           
## ________________________________________________________________________________
## dropout_2 (Dropout)                 (None, 8, 8, 64)                0           
## ________________________________________________________________________________
## conv2d_4 (Conv2D)                   (None, 8, 8, 128)               73856       
## ________________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D)      (None, 4, 4, 128)               0           
## ________________________________________________________________________________
## dropout_1 (Dropout)                 (None, 4, 4, 128)               0           
## ________________________________________________________________________________
## flatten_1 (Flatten)                 (None, 2048)                    0           
## ________________________________________________________________________________
## dense_3 (Dense)                     (None, 1024)                    2098176     
## ________________________________________________________________________________
## dense_2 (Dense)                     (None, 512)                     524800      
## ________________________________________________________________________________
## dropout (Dropout)                   (None, 512)                     0           
## ________________________________________________________________________________
## Output (Dense)                      (None, 10)                      5130        
## ================================================================================
## Total params: 2,730,602
## Trainable params: 2,730,602
## Non-trainable params: 0
## ________________________________________________________________________________

Model Fitting

We will use categorical_crossentropy as the loss function, optimizer adam with learning rate 0.001, and accuracy metric.

# cmodel3 %>% 
#   compile(
#           loss = "categorical_crossentropy",
#           optimizer = optimizer_adam(learning_rate = 0.001),
#           metrics = "accuracy" )

Fitting the model with epoch 30. We will use fit, instead of fit_generator, and without the augmentation.

# chistory3 <- cmodel3 %>% 
#   fit(ctrain_x/255,
#       ctrain_y,
#       epoch = 30,
#       batch = 1000)


chistory3 <- readRDS("chistory3.RDS")

plot(chistory3)
## `geom_smooth()` using formula 'y ~ x'

chistory3
## Trained on 50,000 samples (batch_size=1,000, epochs=30)
## Final epoch (plot to see history):
##     loss: 0.2121
## accuracy: 0.924

Our tuned model accuracy is at 92.4%.

Model Evaluation

Evaluate the model by predicting the test set.

pred_test <- cmodel3 %>% 
  predict(ctest_x/255) %>% 
  k_argmax()

decode <- function(x){
  case_when(x == 0 ~ "airplane",
            x == 1 ~ "automobile",
            x == 2 ~ "bird",
            x == 3 ~ "cat",
            x == 4 ~ "deer",
            x == 5 ~ "dog",
            x == 6 ~ "frog",
            x == 7 ~ "horse",
            x == 8 ~ "ship",
            x == 9 ~ "truck"
            )
}

pred_test <- sapply(as.array(pred_test), decode)

ctest_y <- sapply(cifar10$test$y,decode)


caret::confusionMatrix(as.factor(pred_test),
                       as.factor(ctest_y))
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   airplane automobile bird cat deer dog frog horse ship truck
##   airplane        809          7   43  13    8   5    6     9   25    19
##   automobile       10        894    6   4    1   3    2     1   18    64
##   bird             56          1  715  50   46  32   25    15   11     5
##   cat              21          9   47 650   39 173   46    32   16    11
##   deer             11          5   77  60  804  42   30    54    5     4
##   dog               1          4   36 118   17 661   10    25    2     2
##   frog              7          5   45  56   37  28  866     3    3     3
##   horse             4          0   20  28   42  49    7   853    4    10
##   ship             55         20    7  10    4   4    6     4  901    20
##   truck            26         55    4  11    2   3    2     4   15   862
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8015          
##                  95% CI : (0.7935, 0.8093)
##     No Information Rate : 0.1             
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7794          
##                                           
##  Mcnemar's Test P-Value : 1.996e-06       
## 
## Statistics by Class:
## 
##                      Class: airplane Class: automobile Class: bird Class: cat
## Sensitivity                   0.8090            0.8940      0.7150     0.6500
## Specificity                   0.9850            0.9879      0.9732     0.9562
## Pos Pred Value                0.8570            0.8913      0.7479     0.6226
## Neg Pred Value                0.9789            0.9882      0.9685     0.9609
## Prevalence                    0.1000            0.1000      0.1000     0.1000
## Detection Rate                0.0809            0.0894      0.0715     0.0650
## Detection Prevalence          0.0944            0.1003      0.0956     0.1044
## Balanced Accuracy             0.8970            0.9409      0.8441     0.8031
##                      Class: deer Class: dog Class: frog Class: horse
## Sensitivity               0.8040     0.6610      0.8660       0.8530
## Specificity               0.9680     0.9761      0.9792       0.9818
## Pos Pred Value            0.7363     0.7546      0.8224       0.8387
## Neg Pred Value            0.9780     0.9628      0.9850       0.9836
## Prevalence                0.1000     0.1000      0.1000       0.1000
## Detection Rate            0.0804     0.0661      0.0866       0.0853
## Detection Prevalence      0.1092     0.0876      0.1053       0.1017
## Balanced Accuracy         0.8860     0.8186      0.9226       0.9174
##                      Class: ship Class: truck
## Sensitivity               0.9010       0.8620
## Specificity               0.9856       0.9864
## Pos Pred Value            0.8739       0.8760
## Neg Pred Value            0.9890       0.9847
## Prevalence                0.1000       0.1000
## Detection Rate            0.0901       0.0862
## Detection Prevalence      0.1031       0.0984
## Balanced Accuracy         0.9433       0.9242

Confusion Matrix shows 80.1% accuracy on test set. Our tuned model tend to overfit, but within the acceptable range (12%). We can see here, the model is having hard time to classify class cat and dog. But on other classes, the model is doing a pretty good job.

Conclusion

We have manage to build a model to do classification on CIFAR-10 dataset. Before came up this tuned model, I have tried many other models. In this tuned model, I used the layer_dropout, to prevent overfitting. Without this layer_dropout, the model will way off overfit. For further tuning, we can set the layer_dropout to a higher number. But with 12% difference between train and test dataset, the model is good to go.