Convolutional Neural Network
Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.
CIFAR-10
CIFAR-10 Dataset as it suggests has 10 different categories of images in it. There is a total of 60000 images of 10 different classes naming Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck. Our goal is to build a CNN model, learn from the train dataset and make prediction on test dataset.
cifar10 <- dataset_cifar10()
## Loaded Tensorflow version 2.0.0
ctrain_x <- cifar10$train$x
ctrain_y <- to_categorical(cifar10$train$y, num_classes = 10)
ctest_x <- cifar10$test$x
ctest_y <- cifar10$test$y
Explanatory Data Analysis
All the images are of size 32×32. There are in total 50000 train images and 10000 test images. And these are colored images.
dim(ctrain_x)
## [1] 50000 32 32 3
Let us take a peek of some of the images.
train_sample <- sample(ctrain_x,20)
fig_img = list()
for (i in 1:20 ) {
fig_mat = cifar10$train$x[train_sample[i], , , ]
fig_img[[i]] = normalize(Image(transpose(fig_mat), dim=c(32,32,3), colormode='Color'))
}
fig_img_comb = combine(fig_img[1:20])
fig_img_obj = tile(fig_img_comb,5)
plot(fig_img_obj, all=T)
Data Preprocessing
Class Distribution
All classes are in balance, with each class distribution at 10%
cifar10$train$y %>%
table() %>%
prop.table()
## .
## 0 1 2 3 4 5 6 7 8 9
## 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
Set 30 images per batch.
batch_size <- 30
Augmentation
we will use Image Augmentation technic to increase the size of training set without acquiring new images. The goal is that to teach the model not only with the original image but also the modification of the image, such as flipping the image, rotate it, zooming, crop the image, etc. This will create more robust model.
On this dataset we will set the following augmentation : - Scaling the pixel value by dividing the pixel value by 255 - Flip the image horizontally - Flip the image vertically - Rotate the image from 0 to 45 degrees - Set brightess range from 0.3 to 0.9 - Zoom in or zoom out by 25% (zoom 75% or 125%)
image_gen <- image_data_generator(rescale = 1/255,
horizontal_flip = T,
vertical_flip = T,
rotation_range = 45,
brightness_range=c(0.3,1.2),
zoom_range = 0.25
)
train_image_array_gen <- flow_images_from_data(ctrain_x,
ctrain_y,
batch_size = batch_size,
generator = image_gen)
Model Building with Tensor Flow
For our initial model, we will start with the following layer :
Convolutional layer to extract features from 2D image with
reluactivation function and 32 filtersMax Pooling layer to downsample the image features
Convolutional layer to extract features from 2D image with
reluactivation function and 64 filtersMax Pooling layer to downsample the image features
Flattening layer to flatten data from 2D array to 1D array
Dense layer to capture more information with
reluactivation function and 64 unitsDense layer for output with
softmaxactivation function and 3 units (the target class)
# cmodel1 <- keras_model_sequential() %>%
# layer_conv_2d(filters = 32,
# kernel_size = c(3,3),
# padding = "same",
# activation = "relu",
# input_shape = c(32,32, 3)
# ) %>%
# layer_max_pooling_2d(pool_size = c(2,2)) %>%
# layer_conv_2d(filters = 64,
# kernel_size = c(3,3),
# padding = "same",
# activation = "relu",
# input_shape = c(32,32, 3)
# ) %>%
# layer_max_pooling_2d(pool_size = c(2,2)) %>%
# layer_flatten() %>%
# layer_dense(units = 64,
# activation = "relu") %>%
# layer_dense(name = "Output",
# units = 10,
# activation = "softmax")
cmodel1 <- load_model_hdf5("cmodel1.hdf5")
cmodel1
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_1 (Conv2D) (None, 32, 32, 32) 896
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D) (None, 16, 16, 32) 0
## ________________________________________________________________________________
## conv2d (Conv2D) (None, 16, 16, 64) 18496
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D) (None, 8, 8, 64) 0
## ________________________________________________________________________________
## flatten (Flatten) (None, 4096) 0
## ________________________________________________________________________________
## dense (Dense) (None, 64) 262208
## ________________________________________________________________________________
## Output (Dense) (None, 10) 650
## ================================================================================
## Total params: 282,250
## Trainable params: 282,250
## Non-trainable params: 0
## ________________________________________________________________________________
Model Fitting
We will use categorical_crossentropy as the loss function, optimizer adam with learning rate 0.001, and accuracy metric.
# cmodel1 %>%
# compile(
# loss = "categorical_crossentropy",
# optimizer = optimizer_adam(learning_rate = 0.001),
# metrics = "accuracy")
Fitting the model with epoch 20.
# chistory1 <- cmodel1 %>%
# fit_generator(
# train_image_array_gen,
# steps_per_epoch = as.integer(50000 / batch_size),
# epochs = 20
# )
chistory1 <- readRDS("chistory1.RDS")
plot(chistory1)
## `geom_smooth()` using formula 'y ~ x'
chistory1
## Trained on 1,666 samples (batch_size=NULL, epochs=20)
## Final epoch (plot to see history):
## loss: 1.144
## accuracy: 0.6001
Our initial model accuracy is at 60%.
Model Evaluation
Evaluate the model by predicting the test set.
pred_test <- cmodel1 %>%
predict(ctest_x/255) %>% #scale the x
k_argmax() %>%
as.array()
head(pred_test,10)
## [1] 3 9 8 8 6 6 1 6 3 1
Encoding the test prediction into their classes.
decode <- function(x){
case_when(x == 0 ~ "airplane",
x == 1 ~ "automobile",
x == 2 ~ "bird",
x == 3 ~ "cat",
x == 4 ~ "deer",
x == 5 ~ "dog",
x == 6 ~ "frog",
x == 7 ~ "horse",
x == 8 ~ "ship",
x == 9 ~ "truck"
)
}
pred_test <- sapply(pred_test, decode)
head(pred_test)
## [1] "cat" "truck" "ship" "ship" "frog" "frog"
Confusion Matrix shows 60.5% accuracy on validation set. No overfit, but this accuracy is very low. We will try to tune the model.
ctest_y <- sapply(cifar10$test$y,decode)
caret::confusionMatrix(as.factor(pred_test),
as.factor(ctest_y))
## Confusion Matrix and Statistics
##
## Reference
## Prediction airplane automobile bird cat deer dog frog horse ship truck
## airplane 616 6 80 21 24 18 5 18 82 12
## automobile 26 653 9 7 0 8 8 5 42 59
## bird 24 6 422 56 66 32 15 17 18 3
## cat 11 7 62 327 23 151 40 47 9 10
## deer 15 2 153 87 625 85 44 118 10 6
## dog 7 12 39 140 10 439 7 39 12 9
## frog 11 18 124 189 146 99 828 42 27 37
## horse 35 13 53 74 69 87 8 650 24 18
## ship 141 12 31 34 20 23 17 11 672 26
## truck 114 271 27 65 17 58 28 53 104 820
##
## Overall Statistics
##
## Accuracy : 0.6052
## 95% CI : (0.5955, 0.6148)
## No Information Rate : 0.1
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.5613
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: airplane Class: automobile Class: bird Class: cat
## Sensitivity 0.6160 0.6530 0.4220 0.3270
## Specificity 0.9704 0.9818 0.9737 0.9600
## Pos Pred Value 0.6984 0.7993 0.6404 0.4760
## Neg Pred Value 0.9579 0.9622 0.9381 0.9277
## Prevalence 0.1000 0.1000 0.1000 0.1000
## Detection Rate 0.0616 0.0653 0.0422 0.0327
## Detection Prevalence 0.0882 0.0817 0.0659 0.0687
## Balanced Accuracy 0.7932 0.8174 0.6978 0.6435
## Class: deer Class: dog Class: frog Class: horse
## Sensitivity 0.6250 0.4390 0.8280 0.6500
## Specificity 0.9422 0.9694 0.9230 0.9577
## Pos Pred Value 0.5459 0.6148 0.5444 0.6305
## Neg Pred Value 0.9577 0.9396 0.9797 0.9610
## Prevalence 0.1000 0.1000 0.1000 0.1000
## Detection Rate 0.0625 0.0439 0.0828 0.0650
## Detection Prevalence 0.1145 0.0714 0.1521 0.1031
## Balanced Accuracy 0.7836 0.7042 0.8755 0.8038
## Class: ship Class: truck
## Sensitivity 0.6720 0.8200
## Specificity 0.9650 0.9181
## Pos Pred Value 0.6809 0.5267
## Neg Pred Value 0.9636 0.9787
## Prevalence 0.1000 0.1000
## Detection Rate 0.0672 0.0820
## Detection Prevalence 0.0987 0.1557
## Balanced Accuracy 0.8185 0.8691
Model Tuning
We will tune our model with the following layer :
Convolutional layer to extract features from 2D image with
reluactivation function and 32 filtersConvolutional layer to extract features from 2D image with
reluactivation function and 32 filtersMax Pooling layer to downsample the image features
Layer dropout to prevent overfitting
Convolutional layer to extract features from 2D image with
reluactivation function and 64 filtersMax Pooling layer to downsample the image features
Layer dropout to prevent overfitting
Convolutional layer to extract features from 2D image with
reluactivation function and 128 filtersMax Pooling layer to downsample the image features
Layer dropout to prevent overfitting
Flattening layer to flatten data from 2D array to 1D array
Dense layer to capture more information with
reluactivation function and 1024 unitsDense layer to capture more information with
reluactivation function and 512 unitsLayer dropout to prevent overfitting
Dense layer for output with
softmaxactivation function and 3 units (the target class)
# cmodel3 <- keras_model_sequential() %>%
# layer_conv_2d(filters = 32,
# kernel_size = c(3,3),
# padding = "same",
# activation = "relu",
# input_shape = c(32,32, 3)
# ) %>%
# layer_conv_2d(filters = 32,
# kernel_size = c(3,3),
# padding = "same",
# activation = "relu",
# input_shape = c(32,32, 3)
# ) %>%
# layer_max_pooling_2d(pool_size = c(2,2)) %>%
# layer_dropout(0.25) %>%
# layer_conv_2d(filters = 64,
# kernel_size = c(3,3),
# padding = "same",
# activation = "relu",
# input_shape = c(32,32, 3)
# ) %>%
# layer_max_pooling_2d(pool_size = c(2,2)) %>%
# layer_dropout(0.25) %>%
# layer_conv_2d(filters = 128,
# kernel_size = c(3,3),
# padding = "same",
# activation = "relu",
# input_shape = c(32,32, 3)
# ) %>%
# layer_max_pooling_2d(pool_size = c(2,2)) %>%
# layer_dropout(0.25) %>%
# layer_flatten() %>%
# layer_dense(units = 1024,
# activation = "relu") %>%
# layer_dense(units = 512,
# activation = "relu") %>%
# layer_dropout(0.25) %>%
# layer_dense(name = "Output",
# units = 10,
# activation = "softmax")
cmodel3 <- load_model_hdf5("cmodel3.hdf5")
cmodel3
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_7 (Conv2D) (None, 32, 32, 32) 896
## ________________________________________________________________________________
## conv2d_6 (Conv2D) (None, 32, 32, 32) 9248
## ________________________________________________________________________________
## max_pooling2d_5 (MaxPooling2D) (None, 16, 16, 32) 0
## ________________________________________________________________________________
## dropout_3 (Dropout) (None, 16, 16, 32) 0
## ________________________________________________________________________________
## conv2d_5 (Conv2D) (None, 16, 16, 64) 18496
## ________________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D) (None, 8, 8, 64) 0
## ________________________________________________________________________________
## dropout_2 (Dropout) (None, 8, 8, 64) 0
## ________________________________________________________________________________
## conv2d_4 (Conv2D) (None, 8, 8, 128) 73856
## ________________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D) (None, 4, 4, 128) 0
## ________________________________________________________________________________
## dropout_1 (Dropout) (None, 4, 4, 128) 0
## ________________________________________________________________________________
## flatten_1 (Flatten) (None, 2048) 0
## ________________________________________________________________________________
## dense_3 (Dense) (None, 1024) 2098176
## ________________________________________________________________________________
## dense_2 (Dense) (None, 512) 524800
## ________________________________________________________________________________
## dropout (Dropout) (None, 512) 0
## ________________________________________________________________________________
## Output (Dense) (None, 10) 5130
## ================================================================================
## Total params: 2,730,602
## Trainable params: 2,730,602
## Non-trainable params: 0
## ________________________________________________________________________________
Model Fitting
We will use categorical_crossentropy as the loss function, optimizer adam with learning rate 0.001, and accuracy metric.
# cmodel3 %>%
# compile(
# loss = "categorical_crossentropy",
# optimizer = optimizer_adam(learning_rate = 0.001),
# metrics = "accuracy" )
Fitting the model with epoch 30. We will use fit, instead of fit_generator, and without the augmentation.
# chistory3 <- cmodel3 %>%
# fit(ctrain_x/255,
# ctrain_y,
# epoch = 30,
# batch = 1000)
chistory3 <- readRDS("chistory3.RDS")
plot(chistory3)
## `geom_smooth()` using formula 'y ~ x'
chistory3
## Trained on 50,000 samples (batch_size=1,000, epochs=30)
## Final epoch (plot to see history):
## loss: 0.2121
## accuracy: 0.924
Our tuned model accuracy is at 92.4%.
Model Evaluation
Evaluate the model by predicting the test set.
pred_test <- cmodel3 %>%
predict(ctest_x/255) %>%
k_argmax()
decode <- function(x){
case_when(x == 0 ~ "airplane",
x == 1 ~ "automobile",
x == 2 ~ "bird",
x == 3 ~ "cat",
x == 4 ~ "deer",
x == 5 ~ "dog",
x == 6 ~ "frog",
x == 7 ~ "horse",
x == 8 ~ "ship",
x == 9 ~ "truck"
)
}
pred_test <- sapply(as.array(pred_test), decode)
ctest_y <- sapply(cifar10$test$y,decode)
caret::confusionMatrix(as.factor(pred_test),
as.factor(ctest_y))
## Confusion Matrix and Statistics
##
## Reference
## Prediction airplane automobile bird cat deer dog frog horse ship truck
## airplane 809 7 43 13 8 5 6 9 25 19
## automobile 10 894 6 4 1 3 2 1 18 64
## bird 56 1 715 50 46 32 25 15 11 5
## cat 21 9 47 650 39 173 46 32 16 11
## deer 11 5 77 60 804 42 30 54 5 4
## dog 1 4 36 118 17 661 10 25 2 2
## frog 7 5 45 56 37 28 866 3 3 3
## horse 4 0 20 28 42 49 7 853 4 10
## ship 55 20 7 10 4 4 6 4 901 20
## truck 26 55 4 11 2 3 2 4 15 862
##
## Overall Statistics
##
## Accuracy : 0.8015
## 95% CI : (0.7935, 0.8093)
## No Information Rate : 0.1
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7794
##
## Mcnemar's Test P-Value : 1.996e-06
##
## Statistics by Class:
##
## Class: airplane Class: automobile Class: bird Class: cat
## Sensitivity 0.8090 0.8940 0.7150 0.6500
## Specificity 0.9850 0.9879 0.9732 0.9562
## Pos Pred Value 0.8570 0.8913 0.7479 0.6226
## Neg Pred Value 0.9789 0.9882 0.9685 0.9609
## Prevalence 0.1000 0.1000 0.1000 0.1000
## Detection Rate 0.0809 0.0894 0.0715 0.0650
## Detection Prevalence 0.0944 0.1003 0.0956 0.1044
## Balanced Accuracy 0.8970 0.9409 0.8441 0.8031
## Class: deer Class: dog Class: frog Class: horse
## Sensitivity 0.8040 0.6610 0.8660 0.8530
## Specificity 0.9680 0.9761 0.9792 0.9818
## Pos Pred Value 0.7363 0.7546 0.8224 0.8387
## Neg Pred Value 0.9780 0.9628 0.9850 0.9836
## Prevalence 0.1000 0.1000 0.1000 0.1000
## Detection Rate 0.0804 0.0661 0.0866 0.0853
## Detection Prevalence 0.1092 0.0876 0.1053 0.1017
## Balanced Accuracy 0.8860 0.8186 0.9226 0.9174
## Class: ship Class: truck
## Sensitivity 0.9010 0.8620
## Specificity 0.9856 0.9864
## Pos Pred Value 0.8739 0.8760
## Neg Pred Value 0.9890 0.9847
## Prevalence 0.1000 0.1000
## Detection Rate 0.0901 0.0862
## Detection Prevalence 0.1031 0.0984
## Balanced Accuracy 0.9433 0.9242
Confusion Matrix shows 80.1% accuracy on test set. Our tuned model tend to overfit, but within the acceptable range (12%). We can see here, the model is having hard time to classify class cat and dog. But on other classes, the model is doing a pretty good job.
Conclusion
We have manage to build a model to do classification on CIFAR-10 dataset. Before came up this tuned model, I have tried many other models. In this tuned model, I used the layer_dropout, to prevent overfitting. Without this layer_dropout, the model will way off overfit. For further tuning, we can set the layer_dropout to a higher number. But with 12% difference between train and test dataset, the model is good to go.