Chapter1 Introduction

1.1 CNN Project introduction

In this project, I will use Convolutional Neural Network(CNN) to predict gender classification. And in order to achieve optimal accuracy in predicting, several key steps need to be considered.

  1. Preprocessing is crucial to enhance the model’s performance. Techniques such as face normalization, and augmentation can help standardize input images, reduce variability, and augment the dataset for improved robustness.

  2. Designing a robust CNN architecture is pivotal. Utilize convolutional layers to capture hierarchical features in facial images, followed by pooling layers for spatial down-sampling. Employ fully connected layers for gender classification and integrate techniques like dropout to mitigate overfitting.

  3. Optimizing hyperparameters, such as learning rate, batch size, architecture specifics, and epoch numbers can significantly impact model accuracy.

  4. Lastly, I try fine-tune the model using transfer learning on pre-trained network like xception, which is deep learning with depthwise separable convolutions leveraging their learned features for improved performance on your specific gender prediction task.

Chapter2 Data Preprocessing

2.1 Dataset description

The dataset contains 27,167 jpg files which 17,678 of them are photos of men faces and 9,489 are woman photos. Each file is renamed accordingly to it’s category E.g. woman__0, woman__1, woman_2 etc.

Then I create new folders containing three subsets:

  1. Train: 12000 man, 5500 women, total 17500

  2. Validation: 2500 man, 1200 women, total 3700

  3. Test: 2500 man, 1200 women, total 3700

2.1.1 Stored folders

# Assuming you have the 'man' and 'woman' directories in the specified paths
# firstly generate folders
base.dir = here("data/faces")
if (!dir.exists(base.dir)) dir.create(base.dir)

original.dir = paste(base.dir,"original",sep="/")
if (!dir.exists(original.dir)) dir.create(original.dir)

original.woman.dir = paste(original.dir,"woman",sep="/")
if (!dir.exists(original.woman.dir )) dir.create(original.woman.dir)

original.man.dir = paste(original.dir,"man",sep="/")
if (!dir.exists(original.man.dir )) dir.create(original.man.dir)

train.dir = paste(base.dir,"train",sep = "/")
validation.dir = paste(base.dir,"validation",sep = "/")
test.dir = paste(base.dir,"test",sep = "/")
if (!dir.exists(train.dir)) dir.create(train.dir)
if (!dir.exists(validation.dir)) dir.create(validation.dir)
if (!dir.exists(test.dir)) dir.create(test.dir)

train.woman.dir = paste(train.dir,"woman",sep = "/")
train.man.dir = paste(train.dir,"man",sep = "/")
if (!dir.exists(train.woman.dir)) dir.create(train.woman.dir)
if (!dir.exists(train.man.dir)) dir.create(train.man.dir)

validation.woman.dir = paste(validation.dir,"woman",sep = "/")
validation.man.dir = paste(validation.dir,"man",sep = "/")
if (!dir.exists(validation.woman.dir)) dir.create(validation.woman.dir)
if (!dir.exists(validation.man.dir)) dir.create(validation.man.dir)

test.woman.dir = paste(test.dir,"woman",sep = "/")
test.man.dir = paste(test.dir,"man",sep = "/")
if (!dir.exists(test.woman.dir)) dir.create(test.woman.dir)
if (!dir.exists(test.man.dir)) dir.create(test.man.dir)

2.1.2 Clean-up data

By browsing the image dataset, I find some wrongly classification pictures. Upon identifying misclassified images in the dataset, 102 instances were inaccurately labeled as women, 54 as men, and 43 exhibited dual gender attributes. Addressing these discrepancies is crucial for model improvement. I delete these wrong pictures from our original dataset.

men <- list.files(original.man.dir)
women <- list.files(original.woman.dir)

#there are some error plot, I picked up these wrongly classified pictures from the orignal file.
# Wrongfully women images:
wrong.women.dir = here("data/faces/original/wrong_woman")
if (!dir.exists(wrong.women.dir)) dir.create(wrong.women.dir)
wrong_women =list.files(wrong.women.dir)
cat("wrongly classified pictures as women is ", length(wrong_women),"\n")
## wrongly classified pictures as women is  102
image_list <- lapply(wrong_women, function(filename) {
  file_path <- file.path(wrong.women.dir, filename)
  image <- imager::load.image(file_path)
  return(image)
})


par(mfrow = c(2, 4))

for (i in c(1, 2, 4, 6, 8, 10, 20, 30)) {
  plot(image_list[[i]])
  title("Wrong classified as women", line = 1)
}

# Wrongfully men images:
wrong.men.dir = here("data/faces/original/wrong_man")
if (!dir.exists(wrong.men.dir)) dir.create(wrong.men.dir)
wrong_men =list.files(wrong.men.dir)
cat("wrongly classified pictures as women is ", length(wrong_men))
## wrongly classified pictures as women is  54
image_men_list <- lapply(wrong_men, function(filename) {
  file_path <- file.path(wrong.men.dir, filename)
  image <- imager::load.image(file_path)
  return(image)
})
par(mfrow = c(2, 4))

for (i in c(1, 2, 4, 6, 8, 10, 20, 30)) {
  plot(image_men_list[[i]])
  title("Wrong classified as men", line = 1)
}

# Wrongfully have both gender or no gender:
wrong.dir = here("data/faces/original/wrong")
if (!dir.exists(wrong.dir)) dir.create(wrong.dir)
wrong =list.files(wrong.dir)
cat("wrongly have both gender or no gender is ", length(wrong))
## wrongly have both gender or no gender is  43
wrong_list <- lapply(wrong, function(filename) {
  file_path <- file.path(wrong.dir, filename)
  image <- imager::load.image(file_path)
  return(image)
})
par(mfrow = c(2, 4))
for (i in c(1, 2, 4, 6, 8, 10, 20, 30)) {
  plot(wrong_list[[i]])
  title("Wrong have both gender or no gender", line = 1)
}

par(mfrow = c(1, 1))

Spilting data to three subsets:

  1. Train: 12000 man, 5500 women, total 17500.

  2. Validation: 2500 man, 1200 women, total 3700.

  3. Test: 2500 man, 1200 women, total 3700.

men <- list.files(original.man.dir)
length(men) #17,678 to 17596
women <- list.files(original.woman.dir)
length(women) #8,349 to 8195

train.men=men[1:12000]
file.copy(from=file.path(original.man.dir,train.men),
          to=file.path(train.man.dir))
train.women=women[1:5500]
file.copy(from=file.path(original.woman.dir,train.women),
          to=file.path(train.woman.dir))

validation.men=men[12001:14500]
file.copy(from=file.path(original.man.dir,validation.men),
          to=file.path(validation.man.dir))
validation.women=women[5501:6700]
file.copy(from=file.path(original.woman.dir,validation.women),
          to=file.path(validation.woman.dir))

test.men=men[14501:17000]
file.copy(from=file.path(original.man.dir,test.men),
          to=file.path(test.man.dir))
test.women=women[6701:7900]
file.copy(from=file.path(original.woman.dir,test.women),
          to=file.path(test.woman.dir))

Chapter 3 Models

3.1 Model 1: without dropout layers

3.1.1 Data formation

The data should be now formatted into appropriately pre-processed floating-point tensors.

Currently, the data sets are in the form of JPEG files.using the image_data_generator() function, which can automatically turn image files on disk into batches of pre-processed tensors.

train_generator = flow_images_from_directory(
  directory = train.dir,
  generator = image_data_generator(rescale = 1/255),
  target_size = c(150,150),
  color_mode = "rgb",
  batch_size = 32,
  class_mode = "binary"
)
## Found 17500 images belonging to 2 classes.
#Traning Found 17500 images belonging to 2 classes
table(train_generator$classes)
## 
##     0     1 
## 12000  5500
validation_generator = flow_images_from_directory(
  directory = validation.dir,
  generator = image_data_generator(rescale = 1/255),
  target_size = c(150,150),
  color_mode = "rgb",
  batch_size = 32,
  class_mode = "binary"
)
## Found 3700 images belonging to 2 classes.
table(validation_generator$classes)
## 
##    0    1 
## 2500 1200
test_generator = flow_images_from_directory(
  directory = test.dir,
  generator = image_data_generator(rescale = 1/255),
  target_size = c(150,150),
  color_mode = "rgb",
  batch_size = 32,
  class_mode = "binary"
)
## Found 3700 images belonging to 2 classes.
table(test_generator$classes)
## 
##    0    1 
## 2500 1200

3.1.2 Basic CNN architecture

I plan to develop a sophisticated CNN model with 4 convolutional layers, featuring filters in the sequence of 32-64-128-128. This basic architecture aims to capture intricate patterns in the data.

model = keras_model_sequential() %>%
  layer_conv_2d(
    filters = 32,
    kernel_size = c(3,3),
    padding = "same",
    activation = "relu",
    input_shape = c(150,150,3)
  ) %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  
  layer_flatten() %>%
  layer_dense(units = 512, activation = "relu") %>%
  layer_dense(units = 1, activation = "sigmoid") %>%
  compile(
    optimizer=optimizer_rmsprop(learning_rate=1e-4),
    loss="binary_crossentropy",
    metrics="acc"
  )
summary(model)
## Model: "sequential"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  conv2d_3 (Conv2D)                  (None, 150, 150, 32)            896         
##  max_pooling2d_3 (MaxPooling2D)     (None, 75, 75, 32)              0           
##  conv2d_2 (Conv2D)                  (None, 73, 73, 64)              18496       
##  max_pooling2d_2 (MaxPooling2D)     (None, 36, 36, 64)              0           
##  conv2d_1 (Conv2D)                  (None, 34, 34, 128)             73856       
##  max_pooling2d_1 (MaxPooling2D)     (None, 17, 17, 128)             0           
##  conv2d (Conv2D)                    (None, 15, 15, 128)             147584      
##  max_pooling2d (MaxPooling2D)       (None, 7, 7, 128)               0           
##  flatten (Flatten)                  (None, 6272)                    0           
##  dense_1 (Dense)                    (None, 512)                     3211776     
##  dense (Dense)                      (None, 1)                       513         
## ================================================================================
## Total params: 3453121 (13.17 MB)
## Trainable params: 3453121 (13.17 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________

3.1.3 Fitting the model using a batch generator

model1=model
history1 = model1 %>%
  fit_generator(
    generator = train_generator ,
    steps_per_epoch = 200,
    epochs = 20,
    validation_data = validation_generator,
    validation_steps =50
  )
#save the model
model1   %>% save_model_hdf5(here("output/faces_model1.h5"))
history1 %>% saveRDS(here("output/faces_model1_history.rds"))
model1   <- load_model_hdf5(here("output/faces_model1.h5"))
history1 <- readRDS(here("output/faces_model1_history.rds"))

#Plot the loss and accuracy of model over the training and validation data
plot(history1)

With only 20 epochs, the validation accuracy reaches approximately 86%. However, upon inspecting the history plot, signs of overfitting are apparent. To address this issue, I introduced dropout layers to the model. This regularization technique aims to mitigate overfitting and enhance the model’s generalization performance on unseen.

Model 2 with 2 dropout layers

model.dropout = keras_model_sequential() %>%
  layer_conv_2d(
    filters = 32,
    kernel_size = c(3,3),
    padding = "same",
    activation = "relu",
    input_shape = c(150,150,3)
  ) %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_dropout(0.25) %>%  #add dropout 
  layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_dropout(0.5) %>% #add dropout
  
  layer_flatten() %>%
  layer_dense(units = 512, activation = "relu") %>%

  layer_dense(units = 1, activation = "sigmoid") %>%
  compile(
    optimizer=optimizer_rmsprop(learning_rate=1e-4),
    loss="binary_crossentropy",
    metrics="acc"
  )
summary(model.dropout)
## Model: "sequential_1"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  conv2d_7 (Conv2D)                  (None, 150, 150, 32)            896         
##  max_pooling2d_7 (MaxPooling2D)     (None, 75, 75, 32)              0           
##  conv2d_6 (Conv2D)                  (None, 73, 73, 64)              18496       
##  max_pooling2d_6 (MaxPooling2D)     (None, 36, 36, 64)              0           
##  dropout_1 (Dropout)                (None, 36, 36, 64)              0           
##  conv2d_5 (Conv2D)                  (None, 34, 34, 128)             73856       
##  max_pooling2d_5 (MaxPooling2D)     (None, 17, 17, 128)             0           
##  conv2d_4 (Conv2D)                  (None, 15, 15, 128)             147584      
##  max_pooling2d_4 (MaxPooling2D)     (None, 7, 7, 128)               0           
##  dropout (Dropout)                  (None, 7, 7, 128)               0           
##  flatten_1 (Flatten)                (None, 6272)                    0           
##  dense_3 (Dense)                    (None, 512)                     3211776     
##  dense_2 (Dense)                    (None, 1)                       513         
## ================================================================================
## Total params: 3453121 (13.17 MB)
## Trainable params: 3453121 (13.17 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
model2=model.dropout
history2 = model2 %>%
  fit_generator(
    generator = train_generator ,
    steps_per_epoch = 200,
    epochs = 20,
    validation_data = validation_generator,
    validation_steps =50
  )

model2   %>% save_model_hdf5(here("output/faces_model2.h5"))
history2 %>% saveRDS(here("output/faces_model1_history2.rds"))
# loading from files which were prepared earlier
model2   <- load_model_hdf5(here("output/faces_model2.h5"))
history2 <- readRDS(here("output/faces_model1_history2.rds"))
plot(history2)

Comparing the Model1 and Model2 history plots, the introduction of dropout layers has improved the overfitting issue. The gap between training and validation accuracy is now smaller, indicating a more balanced model that generalizes better to validation data, addressing the overfitting problem observed in the initial training.

Model 3 with larger epochs:100 epochs

Observing the positive trend in accuracy with increasing epochs from Model 2, I decided to extend the duration to 100 epochs. By allowing the model more iterations, I anticipate capturing intricate patterns and refining its performance on the validation set to achieve a more accurate and reliable outcome.

model3=model.dropout
history3 = model3 %>%
  fit_generator(
    generator = train_generator ,
    steps_per_epoch = 200,
    epochs = 100,
    validation_data = validation_generator,
    validation_steps =50
  )

model3   %>% save_model_hdf5(here("output/faces_model4.h5"))
history3 %>% saveRDS(here("output/faces_model1_history4.rds"))
# loading from files which were prepared earlier
model3   <- load_model_hdf5(here("output/faces_model4.h5"))
history3 <- readRDS(here("output/faces_model1_history4.rds"))
plot(history3)

With an extended training duration of 100 epochs, the accuracy has shown improvement, rising from 86% to 90%.

Model 4 with data augmentation

Data augmentation is a crucial technique in machine learning for enhancing model generalization. It involves creating additional training data by applying various random transformations to existing samples, producing diverse and realistic images. By introducing these variations during training, the model becomes exposed to different aspects of the data, preventing overfitting and promoting better generalization.

Firstly, take a look at “augmented” pictures

fnames = list.files(train.man.dir)
img_path = fnames[[10]]
img = image_load(paste0(train.man.dir,"/",img_path),target_size = c(150,150))
img_array = image_to_array(img)
#original picture
plot(img_array %>% as.raster(max=255))

#using data augmentation
datagen <- image_data_generator(
  rescale = 1/255,
  rotation_range = 40,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE,
  fill_mode="nearest"
)   
img_array = array_reshape(img_array,c(1,150,150,3))
augmentation_generator = flow_images_from_data(
  img_array,
  generator = datagen,
  batch_size = 1
)


par(mfrow = c(2, 3))
for (i in 1:6) {
  batch <- generator_next(augmentation_generator)
  plot(as.raster(batch[1, , , ]))
   title("Data augmentation", line = 1)
}

Then using data augmentation for the train data

# to increase the train dataset
datagen <- image_data_generator(
  rescale = 1/255,
  rotation_range = 40,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE,
  fill_mode="nearest"
)   

train_generator <- flow_images_from_directory(
  train.dir,  #fold
  datagen,   #the generator
  target_size = c(150, 150), 
  batch_size = 32,  #sample size 32 to generater new data
  class_mode = "binary"
)
## Found 17500 images belonging to 2 classes.
validation_generator = flow_images_from_directory(
  directory = validation.dir,
  generator = image_data_generator(rescale = 1/255),
  target_size = c(150,150),
  batch_size = 32,
  class_mode = "binary"
)
## Found 3700 images belonging to 2 classes.
table(validation_generator$classes)
## 
##    0    1 
## 2500 1200
test_generator = flow_images_from_directory(
  directory = test.dir,
  generator = image_data_generator(rescale = 1/255),
  target_size = c(150,150),
  batch_size = 32,
  class_mode = "binary"
)
## Found 3700 images belonging to 2 classes.
  • rotation_range is a value in degrees (0–180), a range within which to randomly rotate pictures.
  • width_shift and height_shift are ranges (as a fraction of total width or height)
  • within which to randomly translate pictures vertically or horizontally.
  • shear_range is for randomly applying shearing transformations.
  • zoom_range is for randomly zooming inside pictures.
  • horizontal_flip is for randomly flipping half the images horizontally—relevant when there are no assumptions of horizontal asymmetry (for example, real-world pictures).
  • fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.

Finally, build the model 4 with data augmentation.

model4=model.dropout
history4 <- 
  model4 %>% 
  fit_generator(
    train_generator,
    steps_per_epoch = 200,
    epochs = 100,
    validation_data = validation_generator,
    validation_steps = 100
  )

model4   %>% save_model_hdf5(here("output/faces_model6.h5"))
history4 %>% saveRDS(here("output/faces_model1_history6.rds"))
#loading from files which were prepared earlier
model4   <- load_model_hdf5(here("output/faces_model6.h5"))
history4 <- readRDS(here("output/faces_model1_history6.rds"))
plot(history4)

Using data augmentation, 100 epochs can increase the validation accuracy from 90%-91.4%. So currectly, the optimal is model4, I use this model to evaluate the test data.

acc.model4 <- evaluate(model4, test_generator)[[2]] * 100
## 116/116 - 15s - loss: 0.1910 - acc: 0.9230 - 15s/epoch - 130ms/step
print(sprintf('optimal CNN Model with 100 epochs for Test data accuracy: %5.2f %%', acc.model4))
## [1] "optimal CNN Model with 100 epochs for Test data accuracy: 92.30 %"

CNN Model4 with 100 epochs for Test data accuracy: 92.30 %

Model 5: Advanced model: Xception model

Xception is a deep learning architecture that belongs to the family of convolutional neural networks (CNNs). It was introduced by François Chollet, the creator of the Keras library, in his research paper “Xception: Deep Learning with Depthwise Separable Convolutions.”

The term “Xception” is a blend of “Extreme Inception,” indicating its relationship to the Inception architecture. Xception is designed to improve upon the traditional Inception modules by replacing standard convolutions with depthwise separable convolutions.

# to increase the train dataset
datagen <- image_data_generator(
  rotation_range = 40,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE,
  fill_mode="nearest"
)   

train_generator <- flow_images_from_directory(
  train.dir,  #fold
  datagen,   #the generator
  target_size = c(150, 150), 
  batch_size = 32,  #sample size 32 to generater new data
  class_mode = "binary"
)
## Found 17500 images belonging to 2 classes.
validation_generator = flow_images_from_directory(
  directory = validation.dir,
  target_size = c(150,150),
  batch_size = 32,
  class_mode = "binary"
)
## Found 3700 images belonging to 2 classes.
test_generator = flow_images_from_directory(
  directory = test.dir,
  target_size = c(150,150),
  batch_size = 32,
  class_mode = "binary"
)
## Found 3700 images belonging to 2 classes.

Building Xception model with tunning parameters

# Load pre-trained Xception model
base_model <- application_xception(
  weights = "imagenet",
  input_shape = c(150, 150, 3),
  include_top = FALSE
)
summary(base_model)
## Model: "xception"
## ________________________________________________________________________________
##  Layer (type)       Output Shape         Para   Connected to         Trainable  
##                                          m #                                    
## ================================================================================
##  input_1 (InputLay  [(None, 150, 150,    0      []                   Y          
##  er)                3)]                                                         
##  block1_conv1 (Con  (None, 74, 74, 32)   864    ['input_1[0][0]']    Y          
##  v2D)                                                                           
##  block1_conv1_bn (  (None, 74, 74, 32)   128    ['block1_conv1[0][   Y          
##  BatchNormalizatio                              0]']                            
##  n)                                                                             
##  block1_conv1_act   (None, 74, 74, 32)   0      ['block1_conv1_bn[   Y          
##  (Activation)                                   0][0]']                         
##  block1_conv2 (Con  (None, 72, 72, 64)   1843   ['block1_conv1_act   Y          
##  v2D)                                    2      [0][0]']                        
##  block1_conv2_bn (  (None, 72, 72, 64)   256    ['block1_conv2[0][   Y          
##  BatchNormalizatio                              0]']                            
##  n)                                                                             
##  block1_conv2_act   (None, 72, 72, 64)   0      ['block1_conv2_bn[   Y          
##  (Activation)                                   0][0]']                         
##  block2_sepconv1 (  (None, 72, 72, 128   8768   ['block1_conv2_act   Y          
##  SeparableConv2D)   )                           [0][0]']                        
##  block2_sepconv1_b  (None, 72, 72, 128   512    ['block2_sepconv1[   Y          
##  n (BatchNormaliza  )                           0][0]']                         
##  tion)                                                                          
##  block2_sepconv2_a  (None, 72, 72, 128   0      ['block2_sepconv1_   Y          
##  ct (Activation)    )                           bn[0][0]']                      
##  block2_sepconv2 (  (None, 72, 72, 128   1753   ['block2_sepconv2_   Y          
##  SeparableConv2D)   )                    6      act[0][0]']                     
##  block2_sepconv2_b  (None, 72, 72, 128   512    ['block2_sepconv2[   Y          
##  n (BatchNormaliza  )                           0][0]']                         
##  tion)                                                                          
##  conv2d_8 (Conv2D)  (None, 36, 36, 128   8192   ['block1_conv2_act   Y          
##                     )                           [0][0]']                        
##  block2_pool (MaxP  (None, 36, 36, 128   0      ['block2_sepconv2_   Y          
##  ooling2D)          )                           bn[0][0]']                      
##  batch_normalizati  (None, 36, 36, 128   512    ['conv2d_8[0][0]']   Y          
##  on (BatchNormaliz  )                                                           
##  ation)                                                                         
##  add (Add)          (None, 36, 36, 128   0      ['block2_pool[0][0   Y          
##                     )                           ]',                             
##                                                  'batch_normalizat              
##                                                 ion[0][0]']                     
##  block3_sepconv1_a  (None, 36, 36, 128   0      ['add[0][0]']        Y          
##  ct (Activation)    )                                                           
##  block3_sepconv1 (  (None, 36, 36, 256   3392   ['block3_sepconv1_   Y          
##  SeparableConv2D)   )                    0      act[0][0]']                     
##  block3_sepconv1_b  (None, 36, 36, 256   1024   ['block3_sepconv1[   Y          
##  n (BatchNormaliza  )                           0][0]']                         
##  tion)                                                                          
##  block3_sepconv2_a  (None, 36, 36, 256   0      ['block3_sepconv1_   Y          
##  ct (Activation)    )                           bn[0][0]']                      
##  block3_sepconv2 (  (None, 36, 36, 256   6784   ['block3_sepconv2_   Y          
##  SeparableConv2D)   )                    0      act[0][0]']                     
##  block3_sepconv2_b  (None, 36, 36, 256   1024   ['block3_sepconv2[   Y          
##  n (BatchNormaliza  )                           0][0]']                         
##  tion)                                                                          
##  conv2d_9 (Conv2D)  (None, 18, 18, 256   3276   ['add[0][0]']        Y          
##                     )                    8                                      
##  block3_pool (MaxP  (None, 18, 18, 256   0      ['block3_sepconv2_   Y          
##  ooling2D)          )                           bn[0][0]']                      
##  batch_normalizati  (None, 18, 18, 256   1024   ['conv2d_9[0][0]']   Y          
##  on_1 (BatchNormal  )                                                           
##  ization)                                                                       
##  add_1 (Add)        (None, 18, 18, 256   0      ['block3_pool[0][0   Y          
##                     )                           ]',                             
##                                                  'batch_normalizat              
##                                                 ion_1[0][0]']                   
##  block4_sepconv1_a  (None, 18, 18, 256   0      ['add_1[0][0]']      Y          
##  ct (Activation)    )                                                           
##  block4_sepconv1 (  (None, 18, 18, 728   1886   ['block4_sepconv1_   Y          
##  SeparableConv2D)   )                    72     act[0][0]']                     
##  block4_sepconv1_b  (None, 18, 18, 728   2912   ['block4_sepconv1[   Y          
##  n (BatchNormaliza  )                           0][0]']                         
##  tion)                                                                          
##  block4_sepconv2_a  (None, 18, 18, 728   0      ['block4_sepconv1_   Y          
##  ct (Activation)    )                           bn[0][0]']                      
##  block4_sepconv2 (  (None, 18, 18, 728   5365   ['block4_sepconv2_   Y          
##  SeparableConv2D)   )                    36     act[0][0]']                     
##  block4_sepconv2_b  (None, 18, 18, 728   2912   ['block4_sepconv2[   Y          
##  n (BatchNormaliza  )                           0][0]']                         
##  tion)                                                                          
##  conv2d_10 (Conv2D  (None, 9, 9, 728)    1863   ['add_1[0][0]']      Y          
##  )                                       68                                     
##  block4_pool (MaxP  (None, 9, 9, 728)    0      ['block4_sepconv2_   Y          
##  ooling2D)                                      bn[0][0]']                      
##  batch_normalizati  (None, 9, 9, 728)    2912   ['conv2d_10[0][0]'   Y          
##  on_2 (BatchNormal                              ]                               
##  ization)                                                                       
##  add_2 (Add)        (None, 9, 9, 728)    0      ['block4_pool[0][0   Y          
##                                                 ]',                             
##                                                  'batch_normalizat              
##                                                 ion_2[0][0]']                   
##  block5_sepconv1_a  (None, 9, 9, 728)    0      ['add_2[0][0]']      Y          
##  ct (Activation)                                                                
##  block5_sepconv1 (  (None, 9, 9, 728)    5365   ['block5_sepconv1_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block5_sepconv1_b  (None, 9, 9, 728)    2912   ['block5_sepconv1[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block5_sepconv2_a  (None, 9, 9, 728)    0      ['block5_sepconv1_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block5_sepconv2 (  (None, 9, 9, 728)    5365   ['block5_sepconv2_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block5_sepconv2_b  (None, 9, 9, 728)    2912   ['block5_sepconv2[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block5_sepconv3_a  (None, 9, 9, 728)    0      ['block5_sepconv2_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block5_sepconv3 (  (None, 9, 9, 728)    5365   ['block5_sepconv3_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block5_sepconv3_b  (None, 9, 9, 728)    2912   ['block5_sepconv3[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  add_3 (Add)        (None, 9, 9, 728)    0      ['block5_sepconv3_   Y          
##                                                 bn[0][0]',                      
##                                                  'add_2[0][0]']                 
##  block6_sepconv1_a  (None, 9, 9, 728)    0      ['add_3[0][0]']      Y          
##  ct (Activation)                                                                
##  block6_sepconv1 (  (None, 9, 9, 728)    5365   ['block6_sepconv1_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block6_sepconv1_b  (None, 9, 9, 728)    2912   ['block6_sepconv1[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block6_sepconv2_a  (None, 9, 9, 728)    0      ['block6_sepconv1_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block6_sepconv2 (  (None, 9, 9, 728)    5365   ['block6_sepconv2_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block6_sepconv2_b  (None, 9, 9, 728)    2912   ['block6_sepconv2[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block6_sepconv3_a  (None, 9, 9, 728)    0      ['block6_sepconv2_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block6_sepconv3 (  (None, 9, 9, 728)    5365   ['block6_sepconv3_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block6_sepconv3_b  (None, 9, 9, 728)    2912   ['block6_sepconv3[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  add_4 (Add)        (None, 9, 9, 728)    0      ['block6_sepconv3_   Y          
##                                                 bn[0][0]',                      
##                                                  'add_3[0][0]']                 
##  block7_sepconv1_a  (None, 9, 9, 728)    0      ['add_4[0][0]']      Y          
##  ct (Activation)                                                                
##  block7_sepconv1 (  (None, 9, 9, 728)    5365   ['block7_sepconv1_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block7_sepconv1_b  (None, 9, 9, 728)    2912   ['block7_sepconv1[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block7_sepconv2_a  (None, 9, 9, 728)    0      ['block7_sepconv1_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block7_sepconv2 (  (None, 9, 9, 728)    5365   ['block7_sepconv2_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block7_sepconv2_b  (None, 9, 9, 728)    2912   ['block7_sepconv2[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block7_sepconv3_a  (None, 9, 9, 728)    0      ['block7_sepconv2_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block7_sepconv3 (  (None, 9, 9, 728)    5365   ['block7_sepconv3_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block7_sepconv3_b  (None, 9, 9, 728)    2912   ['block7_sepconv3[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  add_5 (Add)        (None, 9, 9, 728)    0      ['block7_sepconv3_   Y          
##                                                 bn[0][0]',                      
##                                                  'add_4[0][0]']                 
##  block8_sepconv1_a  (None, 9, 9, 728)    0      ['add_5[0][0]']      Y          
##  ct (Activation)                                                                
##  block8_sepconv1 (  (None, 9, 9, 728)    5365   ['block8_sepconv1_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block8_sepconv1_b  (None, 9, 9, 728)    2912   ['block8_sepconv1[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block8_sepconv2_a  (None, 9, 9, 728)    0      ['block8_sepconv1_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block8_sepconv2 (  (None, 9, 9, 728)    5365   ['block8_sepconv2_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block8_sepconv2_b  (None, 9, 9, 728)    2912   ['block8_sepconv2[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block8_sepconv3_a  (None, 9, 9, 728)    0      ['block8_sepconv2_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block8_sepconv3 (  (None, 9, 9, 728)    5365   ['block8_sepconv3_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block8_sepconv3_b  (None, 9, 9, 728)    2912   ['block8_sepconv3[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  add_6 (Add)        (None, 9, 9, 728)    0      ['block8_sepconv3_   Y          
##                                                 bn[0][0]',                      
##                                                  'add_5[0][0]']                 
##  block9_sepconv1_a  (None, 9, 9, 728)    0      ['add_6[0][0]']      Y          
##  ct (Activation)                                                                
##  block9_sepconv1 (  (None, 9, 9, 728)    5365   ['block9_sepconv1_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block9_sepconv1_b  (None, 9, 9, 728)    2912   ['block9_sepconv1[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block9_sepconv2_a  (None, 9, 9, 728)    0      ['block9_sepconv1_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block9_sepconv2 (  (None, 9, 9, 728)    5365   ['block9_sepconv2_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block9_sepconv2_b  (None, 9, 9, 728)    2912   ['block9_sepconv2[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  block9_sepconv3_a  (None, 9, 9, 728)    0      ['block9_sepconv2_   Y          
##  ct (Activation)                                bn[0][0]']                      
##  block9_sepconv3 (  (None, 9, 9, 728)    5365   ['block9_sepconv3_   Y          
##  SeparableConv2D)                        36     act[0][0]']                     
##  block9_sepconv3_b  (None, 9, 9, 728)    2912   ['block9_sepconv3[   Y          
##  n (BatchNormaliza                              0][0]']                         
##  tion)                                                                          
##  add_7 (Add)        (None, 9, 9, 728)    0      ['block9_sepconv3_   Y          
##                                                 bn[0][0]',                      
##                                                  'add_6[0][0]']                 
##  block10_sepconv1_  (None, 9, 9, 728)    0      ['add_7[0][0]']      Y          
##  act (Activation)                                                               
##  block10_sepconv1   (None, 9, 9, 728)    5365   ['block10_sepconv1   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block10_sepconv1_  (None, 9, 9, 728)    2912   ['block10_sepconv1   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block10_sepconv2_  (None, 9, 9, 728)    0      ['block10_sepconv1   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block10_sepconv2   (None, 9, 9, 728)    5365   ['block10_sepconv2   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block10_sepconv2_  (None, 9, 9, 728)    2912   ['block10_sepconv2   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block10_sepconv3_  (None, 9, 9, 728)    0      ['block10_sepconv2   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block10_sepconv3   (None, 9, 9, 728)    5365   ['block10_sepconv3   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block10_sepconv3_  (None, 9, 9, 728)    2912   ['block10_sepconv3   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  add_8 (Add)        (None, 9, 9, 728)    0      ['block10_sepconv3   Y          
##                                                 _bn[0][0]',                     
##                                                  'add_7[0][0]']                 
##  block11_sepconv1_  (None, 9, 9, 728)    0      ['add_8[0][0]']      Y          
##  act (Activation)                                                               
##  block11_sepconv1   (None, 9, 9, 728)    5365   ['block11_sepconv1   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block11_sepconv1_  (None, 9, 9, 728)    2912   ['block11_sepconv1   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block11_sepconv2_  (None, 9, 9, 728)    0      ['block11_sepconv1   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block11_sepconv2   (None, 9, 9, 728)    5365   ['block11_sepconv2   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block11_sepconv2_  (None, 9, 9, 728)    2912   ['block11_sepconv2   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block11_sepconv3_  (None, 9, 9, 728)    0      ['block11_sepconv2   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block11_sepconv3   (None, 9, 9, 728)    5365   ['block11_sepconv3   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block11_sepconv3_  (None, 9, 9, 728)    2912   ['block11_sepconv3   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  add_9 (Add)        (None, 9, 9, 728)    0      ['block11_sepconv3   Y          
##                                                 _bn[0][0]',                     
##                                                  'add_8[0][0]']                 
##  block12_sepconv1_  (None, 9, 9, 728)    0      ['add_9[0][0]']      Y          
##  act (Activation)                                                               
##  block12_sepconv1   (None, 9, 9, 728)    5365   ['block12_sepconv1   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block12_sepconv1_  (None, 9, 9, 728)    2912   ['block12_sepconv1   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block12_sepconv2_  (None, 9, 9, 728)    0      ['block12_sepconv1   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block12_sepconv2   (None, 9, 9, 728)    5365   ['block12_sepconv2   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block12_sepconv2_  (None, 9, 9, 728)    2912   ['block12_sepconv2   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block12_sepconv3_  (None, 9, 9, 728)    0      ['block12_sepconv2   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block12_sepconv3   (None, 9, 9, 728)    5365   ['block12_sepconv3   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block12_sepconv3_  (None, 9, 9, 728)    2912   ['block12_sepconv3   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  add_10 (Add)       (None, 9, 9, 728)    0      ['block12_sepconv3   Y          
##                                                 _bn[0][0]',                     
##                                                  'add_9[0][0]']                 
##  block13_sepconv1_  (None, 9, 9, 728)    0      ['add_10[0][0]']     Y          
##  act (Activation)                                                               
##  block13_sepconv1   (None, 9, 9, 728)    5365   ['block13_sepconv1   Y          
##  (SeparableConv2D)                       36     _act[0][0]']                    
##  block13_sepconv1_  (None, 9, 9, 728)    2912   ['block13_sepconv1   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block13_sepconv2_  (None, 9, 9, 728)    0      ['block13_sepconv1   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block13_sepconv2   (None, 9, 9, 1024)   7520   ['block13_sepconv2   Y          
##  (SeparableConv2D)                       24     _act[0][0]']                    
##  block13_sepconv2_  (None, 9, 9, 1024)   4096   ['block13_sepconv2   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  conv2d_11 (Conv2D  (None, 5, 5, 1024)   7454   ['add_10[0][0]']     Y          
##  )                                       72                                     
##  block13_pool (Max  (None, 5, 5, 1024)   0      ['block13_sepconv2   Y          
##  Pooling2D)                                     _bn[0][0]']                     
##  batch_normalizati  (None, 5, 5, 1024)   4096   ['conv2d_11[0][0]'   Y          
##  on_3 (BatchNormal                              ]                               
##  ization)                                                                       
##  add_11 (Add)       (None, 5, 5, 1024)   0      ['block13_pool[0][   Y          
##                                                 0]',                            
##                                                  'batch_normalizat              
##                                                 ion_3[0][0]']                   
##  block14_sepconv1   (None, 5, 5, 1536)   1582   ['add_11[0][0]']     Y          
##  (SeparableConv2D)                       080                                    
##  block14_sepconv1_  (None, 5, 5, 1536)   6144   ['block14_sepconv1   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block14_sepconv1_  (None, 5, 5, 1536)   0      ['block14_sepconv1   Y          
##  act (Activation)                               _bn[0][0]']                     
##  block14_sepconv2   (None, 5, 5, 2048)   3159   ['block14_sepconv1   Y          
##  (SeparableConv2D)                       552    _act[0][0]']                    
##  block14_sepconv2_  (None, 5, 5, 2048)   8192   ['block14_sepconv2   Y          
##  bn (BatchNormaliz                              [0][0]']                        
##  ation)                                                                         
##  block14_sepconv2_  (None, 5, 5, 2048)   0      ['block14_sepconv2   Y          
##  act (Activation)                               _bn[0][0]']                     
## ================================================================================
## Total params: 20861480 (79.58 MB)
## Trainable params: 20806952 (79.37 MB)
## Non-trainable params: 54528 (213.00 KB)
## ________________________________________________________________________________

Key characteristics of the Xception model include:

  • Depthwise Separable Convolutions: Xception extensively uses depthwise separable convolutions, which consist of a depthwise convolution (spatial filtering) followed by a pointwise convolution (cross-channel filtering). This separation reduces the computational cost significantly while maintaining expressive power.

  • Feature Cross Channels: Xception enhances the information flow across channels, promoting more efficient learning of hierarchical features.

  • Fully Convolutional: The model is fully convolutional, meaning it lacks dense layers. This design choice contributes to a more uniform representation of spatial hierarchies throughout the network.

  • Skip Connections: Xception incorporates skip connections to facilitate the flow of gradients during training, promoting better convergence and preventing the vanishing gradient problem.

  • Global Average Pooling: Instead of using fully connected layers, Xception employs global average pooling at the end of the network. This helps reduce overfitting and produces a fixed-size output regardless of input size.

# Freeze the base_model
base_model$layers[[1]]$trainable <- FALSE

# Create new model on top
inputs <- layer_input(shape = c(150, 150, 3))
x <- inputs

# Pre-trained Xception weights require input scaling
scale_layer <- layer_rescaling(scale = 1 / 127.5, offset = -1)
x <- scale_layer(x)

# The base model contains batchnorm layers. I want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the base_model is running in inference mode here.
x <- base_model(x, training = FALSE)
x <- layer_global_average_pooling_2d()(x)
x <- layer_dropout(x, rate = 0.2)  # Regularize with dropout
outputs <- layer_dense(units = 1)(x)

model.xception <- keras_model(inputs, outputs)

# Compile the model.xception
model.xception %>% compile(
  optimizer = optimizer_adam(),
  loss = loss_binary_crossentropy(from_logits = TRUE),
  metrics = c(metric_binary_accuracy())
)

# Fitting the top layer of the model.xception
epochs <- 5
history.xception.top=model.xception %>% fit(train_generator, 
                       epochs = epochs, 
                       validation_data = validation_generator)
history.xception.top %>%
  saveRDS(here("output/history_xception_top.rds"))

model.xception  %>%
  save_model_hdf5(here("output/faces_xception_top.h5"))
history.xception.top <- readRDS(here("output/history_xception_top.rds"))
model.xception <- load_model_hdf5(here("output/faces_xception_top.h5"))
summary(model.xception)
## Model: "model"
## ________________________________________________________________________________
##  Layer (type)                  Output Shape               Param #    Trainable  
## ================================================================================
##  input_2 (InputLayer)          [(None, 150, 150, 3)]      0          Y          
##  rescaling (Rescaling)         (None, 150, 150, 3)        0          Y          
##  xception (Functional)         (None, 5, 5, 2048)         20861480   Y          
##  global_average_pooling2d (Gl  (None, 2048)               0          Y          
##  obalAveragePooling2D)                                                          
##  dropout (Dropout)             (None, 2048)               0          Y          
##  dense (Dense)                 (None, 1)                  2049       Y          
## ================================================================================
## Total params: 20863529 (79.59 MB)
## Trainable params: 20809001 (79.38 MB)
## Non-trainable params: 54528 (213.00 KB)
## ________________________________________________________________________________
# Allow base_model to be trainable
base_model$layers[[1]]$trainable <- TRUE

# Compile the model with a lower learning rate
model.xception %>% compile(
  optimizer = optimizer_adam(learning_rate = 1e-5),
  loss = loss_binary_crossentropy(from_logits = TRUE),
  metrics = c(metric_binary_accuracy())
)

# Fitting the end-to-end model
epochs <- 20
history.xception.finnal=model.xception %>%
  fit(train_generator, 
      epochs = epochs, 
      validation_data = validation_generator)

model.xception %>%save_model_hdf5(here("output/faces_model_finanl_xception.h5"))
history.xception.finnal %>% saveRDS(here("output/history_xception_final.rds"))
model.xception.finnal <- load_model_hdf5(here("output/faces_model_finanl_xception.h5"))
history.xception.finnal <- readRDS(here("output/history_xception_final.rds"))
plot(history.xception.finnal)

Xception model with only 20 epochs can increase the validation accuracy from CNN model 91.4%. to 93.1%.

Then, Using xception model to evaluate the test data

acc.xception <- evaluate(model.xception.finnal, test_generator)[[2]] * 100
## 116/116 - 121s - loss: 0.1698 - binary_accuracy: 0.9319 - 121s/epoch - 1s/step
print(sprintf('Xception Model with 20 epochs for Test data accuracy: %5.2f %%', acc.xception))
## [1] "Xception Model with 20 epochs for Test data accuracy: 93.19 %"

Final test data accuracy is 93.19%.

The Xception model is overly complex, resulting in lengthy runtimes, approximately one and half an hour per epoch. Due to time constraints, I limited the training to 20 epochs. To balance computational efficiency and predictive performance, I aim to construct a more intricate model. This will allow for a thorough exploration of its capabilities within a reasonable time frame, potentially enhancing predictive accuracy.

Model 6: Complicated model

I plan to develop a sophisticated CNN model with six convolutional layers, featuring filters in the sequence of 32-64-128-256-512-512. This expanded architecture aims to capture more intricate patterns in the data compared to previous 4 CNN models.

model.extend = keras_model_sequential() %>%
  layer_conv_2d(
    filters = 32,
    kernel_size = c(3,3),
    padding = "same",
    activation = "relu",
    input_shape = c(150,150,3)
  ) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_dropout(0.25) %>%  #add dropout 
  
  layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
  layer_conv_2d(filters = 256, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_dropout(0.25) %>%  #add dropout 
  
  layer_conv_2d(filters = 512, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_dropout(0.25) %>%  #add dropout 
  
  layer_conv_2d(filters = 512, kernel_size = c(3,3),activation = "relu") %>%
  layer_max_pooling_2d(c(2,2)) %>%
  layer_dropout(0.25) %>%  #add dropout 
  
  layer_flatten() %>%
  layer_dense(units = 512, activation = "relu") %>%

  layer_dense(units = 1, activation = "sigmoid") %>%
  compile(
    optimizer=optimizer_rmsprop(learning_rate=1e-4),
    loss="binary_crossentropy",
    metrics="acc"
  )
summary(model.extend)
## Model: "sequential_2"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  conv2d_17 (Conv2D)                 (None, 150, 150, 32)            896         
##  conv2d_16 (Conv2D)                 (None, 148, 148, 64)            18496       
##  max_pooling2d_11 (MaxPooling2D)    (None, 74, 74, 64)              0           
##  dropout_5 (Dropout)                (None, 74, 74, 64)              0           
##  conv2d_15 (Conv2D)                 (None, 72, 72, 128)             73856       
##  conv2d_14 (Conv2D)                 (None, 70, 70, 256)             295168      
##  max_pooling2d_10 (MaxPooling2D)    (None, 35, 35, 256)             0           
##  dropout_4 (Dropout)                (None, 35, 35, 256)             0           
##  conv2d_13 (Conv2D)                 (None, 33, 33, 512)             1180160     
##  max_pooling2d_9 (MaxPooling2D)     (None, 16, 16, 512)             0           
##  dropout_3 (Dropout)                (None, 16, 16, 512)             0           
##  conv2d_12 (Conv2D)                 (None, 14, 14, 512)             2359808     
##  max_pooling2d_8 (MaxPooling2D)     (None, 7, 7, 512)               0           
##  dropout_2 (Dropout)                (None, 7, 7, 512)               0           
##  flatten_2 (Flatten)                (None, 25088)                   0           
##  dense_5 (Dense)                    (None, 512)                     12845568    
##  dense_4 (Dense)                    (None, 1)                       513         
## ================================================================================
## Total params: 16774465 (63.99 MB)
## Trainable params: 16774465 (63.99 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
model6=model.extend
history6 <- 
  model6 %>% 
  fit_generator(
    train_generator,
    steps_per_epoch = 200,
    epochs = 100,
    validation_data = validation_generator,
    validation_steps = 50
  )

model6   %>% save_model_hdf5(here("output/faces_model8.h5"))
history6 %>% saveRDS(here("output/faces_model1_history8.rds"))
#loading from files which were prepared earlier
model6   <- load_model_hdf5(here("output/faces_model8.h5"))
history6 <- readRDS(here("output/faces_model1_history8.rds"))
plot(history6)

After implementing a more intricate model (Model 6), the validation accuracy showed improvement from 91.4%(Model 4) to 92.5%.

acc.model6 <- evaluate(model6, test_generator)[[2]] * 100
## 116/116 - 172s - loss: 12.3619 - acc: 0.9359 - 172s/epoch - 1s/step
print(sprintf('optimal CNN Model with 100 epochs for Test data accuracy: %5.2f %%', acc.model6))
## [1] "optimal CNN Model with 100 epochs for Test data accuracy: 93.59 %"

Model 6 with 100 epochs for Test data accuracy: 93.59%, which is the best now.

Therefore, Model 6 is considered the best-performing model in terms of predictive accuracy for the given test data with accuracy of 93.59%.