In this project, I will use Convolutional Neural Network(CNN) to predict gender classification. And in order to achieve optimal accuracy in predicting, several key steps need to be considered.
Preprocessing is crucial to enhance the model’s performance. Techniques such as face normalization, and augmentation can help standardize input images, reduce variability, and augment the dataset for improved robustness.
Designing a robust CNN architecture is pivotal. Utilize convolutional layers to capture hierarchical features in facial images, followed by pooling layers for spatial down-sampling. Employ fully connected layers for gender classification and integrate techniques like dropout to mitigate overfitting.
Optimizing hyperparameters, such as learning rate, batch size, architecture specifics, and epoch numbers can significantly impact model accuracy.
Lastly, I try fine-tune the model using transfer learning on pre-trained network like xception, which is deep learning with depthwise separable convolutions leveraging their learned features for improved performance on your specific gender prediction task.
The dataset contains 27,167 jpg files which 17,678 of them are photos of men faces and 9,489 are woman photos. Each file is renamed accordingly to it’s category E.g. woman__0, woman__1, woman_2 etc.
Then I create new folders containing three subsets:
Train: 12000 man, 5500 women, total 17500
Validation: 2500 man, 1200 women, total 3700
Test: 2500 man, 1200 women, total 3700
# Assuming you have the 'man' and 'woman' directories in the specified paths
# firstly generate folders
base.dir = here("data/faces")
if (!dir.exists(base.dir)) dir.create(base.dir)
original.dir = paste(base.dir,"original",sep="/")
if (!dir.exists(original.dir)) dir.create(original.dir)
original.woman.dir = paste(original.dir,"woman",sep="/")
if (!dir.exists(original.woman.dir )) dir.create(original.woman.dir)
original.man.dir = paste(original.dir,"man",sep="/")
if (!dir.exists(original.man.dir )) dir.create(original.man.dir)
train.dir = paste(base.dir,"train",sep = "/")
validation.dir = paste(base.dir,"validation",sep = "/")
test.dir = paste(base.dir,"test",sep = "/")
if (!dir.exists(train.dir)) dir.create(train.dir)
if (!dir.exists(validation.dir)) dir.create(validation.dir)
if (!dir.exists(test.dir)) dir.create(test.dir)
train.woman.dir = paste(train.dir,"woman",sep = "/")
train.man.dir = paste(train.dir,"man",sep = "/")
if (!dir.exists(train.woman.dir)) dir.create(train.woman.dir)
if (!dir.exists(train.man.dir)) dir.create(train.man.dir)
validation.woman.dir = paste(validation.dir,"woman",sep = "/")
validation.man.dir = paste(validation.dir,"man",sep = "/")
if (!dir.exists(validation.woman.dir)) dir.create(validation.woman.dir)
if (!dir.exists(validation.man.dir)) dir.create(validation.man.dir)
test.woman.dir = paste(test.dir,"woman",sep = "/")
test.man.dir = paste(test.dir,"man",sep = "/")
if (!dir.exists(test.woman.dir)) dir.create(test.woman.dir)
if (!dir.exists(test.man.dir)) dir.create(test.man.dir)By browsing the image dataset, I find some wrongly classification pictures. Upon identifying misclassified images in the dataset, 102 instances were inaccurately labeled as women, 54 as men, and 43 exhibited dual gender attributes. Addressing these discrepancies is crucial for model improvement. I delete these wrong pictures from our original dataset.
men <- list.files(original.man.dir)
women <- list.files(original.woman.dir)
#there are some error plot, I picked up these wrongly classified pictures from the orignal file.
# Wrongfully women images:
wrong.women.dir = here("data/faces/original/wrong_woman")
if (!dir.exists(wrong.women.dir)) dir.create(wrong.women.dir)
wrong_women =list.files(wrong.women.dir)
cat("wrongly classified pictures as women is ", length(wrong_women),"\n")## wrongly classified pictures as women is 102
image_list <- lapply(wrong_women, function(filename) {
file_path <- file.path(wrong.women.dir, filename)
image <- imager::load.image(file_path)
return(image)
})
par(mfrow = c(2, 4))
for (i in c(1, 2, 4, 6, 8, 10, 20, 30)) {
plot(image_list[[i]])
title("Wrong classified as women", line = 1)
}# Wrongfully men images:
wrong.men.dir = here("data/faces/original/wrong_man")
if (!dir.exists(wrong.men.dir)) dir.create(wrong.men.dir)
wrong_men =list.files(wrong.men.dir)
cat("wrongly classified pictures as women is ", length(wrong_men))## wrongly classified pictures as women is 54
image_men_list <- lapply(wrong_men, function(filename) {
file_path <- file.path(wrong.men.dir, filename)
image <- imager::load.image(file_path)
return(image)
})
par(mfrow = c(2, 4))
for (i in c(1, 2, 4, 6, 8, 10, 20, 30)) {
plot(image_men_list[[i]])
title("Wrong classified as men", line = 1)
}# Wrongfully have both gender or no gender:
wrong.dir = here("data/faces/original/wrong")
if (!dir.exists(wrong.dir)) dir.create(wrong.dir)
wrong =list.files(wrong.dir)
cat("wrongly have both gender or no gender is ", length(wrong))## wrongly have both gender or no gender is 43
wrong_list <- lapply(wrong, function(filename) {
file_path <- file.path(wrong.dir, filename)
image <- imager::load.image(file_path)
return(image)
})
par(mfrow = c(2, 4))
for (i in c(1, 2, 4, 6, 8, 10, 20, 30)) {
plot(wrong_list[[i]])
title("Wrong have both gender or no gender", line = 1)
}Train: 12000 man, 5500 women, total 17500.
Validation: 2500 man, 1200 women, total 3700.
Test: 2500 man, 1200 women, total 3700.
men <- list.files(original.man.dir)
length(men) #17,678 to 17596
women <- list.files(original.woman.dir)
length(women) #8,349 to 8195
train.men=men[1:12000]
file.copy(from=file.path(original.man.dir,train.men),
to=file.path(train.man.dir))
train.women=women[1:5500]
file.copy(from=file.path(original.woman.dir,train.women),
to=file.path(train.woman.dir))
validation.men=men[12001:14500]
file.copy(from=file.path(original.man.dir,validation.men),
to=file.path(validation.man.dir))
validation.women=women[5501:6700]
file.copy(from=file.path(original.woman.dir,validation.women),
to=file.path(validation.woman.dir))
test.men=men[14501:17000]
file.copy(from=file.path(original.man.dir,test.men),
to=file.path(test.man.dir))
test.women=women[6701:7900]
file.copy(from=file.path(original.woman.dir,test.women),
to=file.path(test.woman.dir))The data should be now formatted into appropriately pre-processed floating-point tensors.
Currently, the data sets are in the form of JPEG files.using the
image_data_generator() function, which can automatically
turn image files on disk into batches of pre-processed tensors.
train_generator = flow_images_from_directory(
directory = train.dir,
generator = image_data_generator(rescale = 1/255),
target_size = c(150,150),
color_mode = "rgb",
batch_size = 32,
class_mode = "binary"
)## Found 17500 images belonging to 2 classes.
##
## 0 1
## 12000 5500
validation_generator = flow_images_from_directory(
directory = validation.dir,
generator = image_data_generator(rescale = 1/255),
target_size = c(150,150),
color_mode = "rgb",
batch_size = 32,
class_mode = "binary"
)## Found 3700 images belonging to 2 classes.
##
## 0 1
## 2500 1200
test_generator = flow_images_from_directory(
directory = test.dir,
generator = image_data_generator(rescale = 1/255),
target_size = c(150,150),
color_mode = "rgb",
batch_size = 32,
class_mode = "binary"
)## Found 3700 images belonging to 2 classes.
##
## 0 1
## 2500 1200
I plan to develop a sophisticated CNN model with 4 convolutional layers, featuring filters in the sequence of 32-64-128-128. This basic architecture aims to capture intricate patterns in the data.
model = keras_model_sequential() %>%
layer_conv_2d(
filters = 32,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = c(150,150,3)
) %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_flatten() %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid") %>%
compile(
optimizer=optimizer_rmsprop(learning_rate=1e-4),
loss="binary_crossentropy",
metrics="acc"
)
summary(model)## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_3 (Conv2D) (None, 150, 150, 32) 896
## max_pooling2d_3 (MaxPooling2D) (None, 75, 75, 32) 0
## conv2d_2 (Conv2D) (None, 73, 73, 64) 18496
## max_pooling2d_2 (MaxPooling2D) (None, 36, 36, 64) 0
## conv2d_1 (Conv2D) (None, 34, 34, 128) 73856
## max_pooling2d_1 (MaxPooling2D) (None, 17, 17, 128) 0
## conv2d (Conv2D) (None, 15, 15, 128) 147584
## max_pooling2d (MaxPooling2D) (None, 7, 7, 128) 0
## flatten (Flatten) (None, 6272) 0
## dense_1 (Dense) (None, 512) 3211776
## dense (Dense) (None, 1) 513
## ================================================================================
## Total params: 3453121 (13.17 MB)
## Trainable params: 3453121 (13.17 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
model1=model
history1 = model1 %>%
fit_generator(
generator = train_generator ,
steps_per_epoch = 200,
epochs = 20,
validation_data = validation_generator,
validation_steps =50
)#save the model
model1 %>% save_model_hdf5(here("output/faces_model1.h5"))
history1 %>% saveRDS(here("output/faces_model1_history.rds"))model1 <- load_model_hdf5(here("output/faces_model1.h5"))
history1 <- readRDS(here("output/faces_model1_history.rds"))
#Plot the loss and accuracy of model over the training and validation data
plot(history1)With only 20 epochs, the validation accuracy reaches approximately 86%. However, upon inspecting the history plot, signs of overfitting are apparent. To address this issue, I introduced dropout layers to the model. This regularization technique aims to mitigate overfitting and enhance the model’s generalization performance on unseen.
model.dropout = keras_model_sequential() %>%
layer_conv_2d(
filters = 32,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = c(150,150,3)
) %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_dropout(0.25) %>% #add dropout
layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_dropout(0.5) %>% #add dropout
layer_flatten() %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid") %>%
compile(
optimizer=optimizer_rmsprop(learning_rate=1e-4),
loss="binary_crossentropy",
metrics="acc"
)
summary(model.dropout)## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_7 (Conv2D) (None, 150, 150, 32) 896
## max_pooling2d_7 (MaxPooling2D) (None, 75, 75, 32) 0
## conv2d_6 (Conv2D) (None, 73, 73, 64) 18496
## max_pooling2d_6 (MaxPooling2D) (None, 36, 36, 64) 0
## dropout_1 (Dropout) (None, 36, 36, 64) 0
## conv2d_5 (Conv2D) (None, 34, 34, 128) 73856
## max_pooling2d_5 (MaxPooling2D) (None, 17, 17, 128) 0
## conv2d_4 (Conv2D) (None, 15, 15, 128) 147584
## max_pooling2d_4 (MaxPooling2D) (None, 7, 7, 128) 0
## dropout (Dropout) (None, 7, 7, 128) 0
## flatten_1 (Flatten) (None, 6272) 0
## dense_3 (Dense) (None, 512) 3211776
## dense_2 (Dense) (None, 1) 513
## ================================================================================
## Total params: 3453121 (13.17 MB)
## Trainable params: 3453121 (13.17 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
model2=model.dropout
history2 = model2 %>%
fit_generator(
generator = train_generator ,
steps_per_epoch = 200,
epochs = 20,
validation_data = validation_generator,
validation_steps =50
)
model2 %>% save_model_hdf5(here("output/faces_model2.h5"))
history2 %>% saveRDS(here("output/faces_model1_history2.rds"))# loading from files which were prepared earlier
model2 <- load_model_hdf5(here("output/faces_model2.h5"))
history2 <- readRDS(here("output/faces_model1_history2.rds"))
plot(history2)Comparing the Model1 and Model2 history plots, the introduction of dropout layers has improved the overfitting issue. The gap between training and validation accuracy is now smaller, indicating a more balanced model that generalizes better to validation data, addressing the overfitting problem observed in the initial training.
Observing the positive trend in accuracy with increasing epochs from Model 2, I decided to extend the duration to 100 epochs. By allowing the model more iterations, I anticipate capturing intricate patterns and refining its performance on the validation set to achieve a more accurate and reliable outcome.
model3=model.dropout
history3 = model3 %>%
fit_generator(
generator = train_generator ,
steps_per_epoch = 200,
epochs = 100,
validation_data = validation_generator,
validation_steps =50
)
model3 %>% save_model_hdf5(here("output/faces_model4.h5"))
history3 %>% saveRDS(here("output/faces_model1_history4.rds"))# loading from files which were prepared earlier
model3 <- load_model_hdf5(here("output/faces_model4.h5"))
history3 <- readRDS(here("output/faces_model1_history4.rds"))
plot(history3)With an extended training duration of 100 epochs, the accuracy has shown improvement, rising from 86% to 90%.
Data augmentation is a crucial technique in machine learning for enhancing model generalization. It involves creating additional training data by applying various random transformations to existing samples, producing diverse and realistic images. By introducing these variations during training, the model becomes exposed to different aspects of the data, preventing overfitting and promoting better generalization.
Firstly, take a look at “augmented” pictures
fnames = list.files(train.man.dir)
img_path = fnames[[10]]
img = image_load(paste0(train.man.dir,"/",img_path),target_size = c(150,150))
img_array = image_to_array(img)
#original picture
plot(img_array %>% as.raster(max=255))#using data augmentation
datagen <- image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode="nearest"
)
img_array = array_reshape(img_array,c(1,150,150,3))
augmentation_generator = flow_images_from_data(
img_array,
generator = datagen,
batch_size = 1
)
par(mfrow = c(2, 3))
for (i in 1:6) {
batch <- generator_next(augmentation_generator)
plot(as.raster(batch[1, , , ]))
title("Data augmentation", line = 1)
}Then using data augmentation for the train data
# to increase the train dataset
datagen <- image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode="nearest"
)
train_generator <- flow_images_from_directory(
train.dir, #fold
datagen, #the generator
target_size = c(150, 150),
batch_size = 32, #sample size 32 to generater new data
class_mode = "binary"
)## Found 17500 images belonging to 2 classes.
validation_generator = flow_images_from_directory(
directory = validation.dir,
generator = image_data_generator(rescale = 1/255),
target_size = c(150,150),
batch_size = 32,
class_mode = "binary"
)## Found 3700 images belonging to 2 classes.
##
## 0 1
## 2500 1200
test_generator = flow_images_from_directory(
directory = test.dir,
generator = image_data_generator(rescale = 1/255),
target_size = c(150,150),
batch_size = 32,
class_mode = "binary"
)## Found 3700 images belonging to 2 classes.
rotation_range is a value in degrees (0–180), a range
within which to randomly rotate pictures.width_shift and height_shift are ranges
(as a fraction of total width or height)shear_range is for randomly applying shearing
transformations.zoom_range is for randomly zooming inside
pictures.horizontal_flip is for randomly flipping half the
images horizontally—relevant when there are no assumptions of horizontal
asymmetry (for example, real-world pictures).fill_mode is the strategy used for filling in newly
created pixels, which can appear after a rotation or a width/height
shift.Finally, build the model 4 with data augmentation.
model4=model.dropout
history4 <-
model4 %>%
fit_generator(
train_generator,
steps_per_epoch = 200,
epochs = 100,
validation_data = validation_generator,
validation_steps = 100
)
model4 %>% save_model_hdf5(here("output/faces_model6.h5"))
history4 %>% saveRDS(here("output/faces_model1_history6.rds"))#loading from files which were prepared earlier
model4 <- load_model_hdf5(here("output/faces_model6.h5"))
history4 <- readRDS(here("output/faces_model1_history6.rds"))
plot(history4)Using data augmentation, 100 epochs can increase the validation accuracy from 90%-91.4%. So currectly, the optimal is model4, I use this model to evaluate the test data.
## 116/116 - 15s - loss: 0.1910 - acc: 0.9230 - 15s/epoch - 130ms/step
## [1] "optimal CNN Model with 100 epochs for Test data accuracy: 92.30 %"
CNN Model4 with 100 epochs for Test data accuracy: 92.30 %
Xception is a deep learning architecture that belongs to the family of convolutional neural networks (CNNs). It was introduced by François Chollet, the creator of the Keras library, in his research paper “Xception: Deep Learning with Depthwise Separable Convolutions.”
The term “Xception” is a blend of “Extreme Inception,” indicating its relationship to the Inception architecture. Xception is designed to improve upon the traditional Inception modules by replacing standard convolutions with depthwise separable convolutions.
# to increase the train dataset
datagen <- image_data_generator(
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode="nearest"
)
train_generator <- flow_images_from_directory(
train.dir, #fold
datagen, #the generator
target_size = c(150, 150),
batch_size = 32, #sample size 32 to generater new data
class_mode = "binary"
)## Found 17500 images belonging to 2 classes.
validation_generator = flow_images_from_directory(
directory = validation.dir,
target_size = c(150,150),
batch_size = 32,
class_mode = "binary"
)## Found 3700 images belonging to 2 classes.
test_generator = flow_images_from_directory(
directory = test.dir,
target_size = c(150,150),
batch_size = 32,
class_mode = "binary"
)## Found 3700 images belonging to 2 classes.
# Load pre-trained Xception model
base_model <- application_xception(
weights = "imagenet",
input_shape = c(150, 150, 3),
include_top = FALSE
)## Model: "xception"
## ________________________________________________________________________________
## Layer (type) Output Shape Para Connected to Trainable
## m #
## ================================================================================
## input_1 (InputLay [(None, 150, 150, 0 [] Y
## er) 3)]
## block1_conv1 (Con (None, 74, 74, 32) 864 ['input_1[0][0]'] Y
## v2D)
## block1_conv1_bn ( (None, 74, 74, 32) 128 ['block1_conv1[0][ Y
## BatchNormalizatio 0]']
## n)
## block1_conv1_act (None, 74, 74, 32) 0 ['block1_conv1_bn[ Y
## (Activation) 0][0]']
## block1_conv2 (Con (None, 72, 72, 64) 1843 ['block1_conv1_act Y
## v2D) 2 [0][0]']
## block1_conv2_bn ( (None, 72, 72, 64) 256 ['block1_conv2[0][ Y
## BatchNormalizatio 0]']
## n)
## block1_conv2_act (None, 72, 72, 64) 0 ['block1_conv2_bn[ Y
## (Activation) 0][0]']
## block2_sepconv1 ( (None, 72, 72, 128 8768 ['block1_conv2_act Y
## SeparableConv2D) ) [0][0]']
## block2_sepconv1_b (None, 72, 72, 128 512 ['block2_sepconv1[ Y
## n (BatchNormaliza ) 0][0]']
## tion)
## block2_sepconv2_a (None, 72, 72, 128 0 ['block2_sepconv1_ Y
## ct (Activation) ) bn[0][0]']
## block2_sepconv2 ( (None, 72, 72, 128 1753 ['block2_sepconv2_ Y
## SeparableConv2D) ) 6 act[0][0]']
## block2_sepconv2_b (None, 72, 72, 128 512 ['block2_sepconv2[ Y
## n (BatchNormaliza ) 0][0]']
## tion)
## conv2d_8 (Conv2D) (None, 36, 36, 128 8192 ['block1_conv2_act Y
## ) [0][0]']
## block2_pool (MaxP (None, 36, 36, 128 0 ['block2_sepconv2_ Y
## ooling2D) ) bn[0][0]']
## batch_normalizati (None, 36, 36, 128 512 ['conv2d_8[0][0]'] Y
## on (BatchNormaliz )
## ation)
## add (Add) (None, 36, 36, 128 0 ['block2_pool[0][0 Y
## ) ]',
## 'batch_normalizat
## ion[0][0]']
## block3_sepconv1_a (None, 36, 36, 128 0 ['add[0][0]'] Y
## ct (Activation) )
## block3_sepconv1 ( (None, 36, 36, 256 3392 ['block3_sepconv1_ Y
## SeparableConv2D) ) 0 act[0][0]']
## block3_sepconv1_b (None, 36, 36, 256 1024 ['block3_sepconv1[ Y
## n (BatchNormaliza ) 0][0]']
## tion)
## block3_sepconv2_a (None, 36, 36, 256 0 ['block3_sepconv1_ Y
## ct (Activation) ) bn[0][0]']
## block3_sepconv2 ( (None, 36, 36, 256 6784 ['block3_sepconv2_ Y
## SeparableConv2D) ) 0 act[0][0]']
## block3_sepconv2_b (None, 36, 36, 256 1024 ['block3_sepconv2[ Y
## n (BatchNormaliza ) 0][0]']
## tion)
## conv2d_9 (Conv2D) (None, 18, 18, 256 3276 ['add[0][0]'] Y
## ) 8
## block3_pool (MaxP (None, 18, 18, 256 0 ['block3_sepconv2_ Y
## ooling2D) ) bn[0][0]']
## batch_normalizati (None, 18, 18, 256 1024 ['conv2d_9[0][0]'] Y
## on_1 (BatchNormal )
## ization)
## add_1 (Add) (None, 18, 18, 256 0 ['block3_pool[0][0 Y
## ) ]',
## 'batch_normalizat
## ion_1[0][0]']
## block4_sepconv1_a (None, 18, 18, 256 0 ['add_1[0][0]'] Y
## ct (Activation) )
## block4_sepconv1 ( (None, 18, 18, 728 1886 ['block4_sepconv1_ Y
## SeparableConv2D) ) 72 act[0][0]']
## block4_sepconv1_b (None, 18, 18, 728 2912 ['block4_sepconv1[ Y
## n (BatchNormaliza ) 0][0]']
## tion)
## block4_sepconv2_a (None, 18, 18, 728 0 ['block4_sepconv1_ Y
## ct (Activation) ) bn[0][0]']
## block4_sepconv2 ( (None, 18, 18, 728 5365 ['block4_sepconv2_ Y
## SeparableConv2D) ) 36 act[0][0]']
## block4_sepconv2_b (None, 18, 18, 728 2912 ['block4_sepconv2[ Y
## n (BatchNormaliza ) 0][0]']
## tion)
## conv2d_10 (Conv2D (None, 9, 9, 728) 1863 ['add_1[0][0]'] Y
## ) 68
## block4_pool (MaxP (None, 9, 9, 728) 0 ['block4_sepconv2_ Y
## ooling2D) bn[0][0]']
## batch_normalizati (None, 9, 9, 728) 2912 ['conv2d_10[0][0]' Y
## on_2 (BatchNormal ]
## ization)
## add_2 (Add) (None, 9, 9, 728) 0 ['block4_pool[0][0 Y
## ]',
## 'batch_normalizat
## ion_2[0][0]']
## block5_sepconv1_a (None, 9, 9, 728) 0 ['add_2[0][0]'] Y
## ct (Activation)
## block5_sepconv1 ( (None, 9, 9, 728) 5365 ['block5_sepconv1_ Y
## SeparableConv2D) 36 act[0][0]']
## block5_sepconv1_b (None, 9, 9, 728) 2912 ['block5_sepconv1[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block5_sepconv2_a (None, 9, 9, 728) 0 ['block5_sepconv1_ Y
## ct (Activation) bn[0][0]']
## block5_sepconv2 ( (None, 9, 9, 728) 5365 ['block5_sepconv2_ Y
## SeparableConv2D) 36 act[0][0]']
## block5_sepconv2_b (None, 9, 9, 728) 2912 ['block5_sepconv2[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block5_sepconv3_a (None, 9, 9, 728) 0 ['block5_sepconv2_ Y
## ct (Activation) bn[0][0]']
## block5_sepconv3 ( (None, 9, 9, 728) 5365 ['block5_sepconv3_ Y
## SeparableConv2D) 36 act[0][0]']
## block5_sepconv3_b (None, 9, 9, 728) 2912 ['block5_sepconv3[ Y
## n (BatchNormaliza 0][0]']
## tion)
## add_3 (Add) (None, 9, 9, 728) 0 ['block5_sepconv3_ Y
## bn[0][0]',
## 'add_2[0][0]']
## block6_sepconv1_a (None, 9, 9, 728) 0 ['add_3[0][0]'] Y
## ct (Activation)
## block6_sepconv1 ( (None, 9, 9, 728) 5365 ['block6_sepconv1_ Y
## SeparableConv2D) 36 act[0][0]']
## block6_sepconv1_b (None, 9, 9, 728) 2912 ['block6_sepconv1[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block6_sepconv2_a (None, 9, 9, 728) 0 ['block6_sepconv1_ Y
## ct (Activation) bn[0][0]']
## block6_sepconv2 ( (None, 9, 9, 728) 5365 ['block6_sepconv2_ Y
## SeparableConv2D) 36 act[0][0]']
## block6_sepconv2_b (None, 9, 9, 728) 2912 ['block6_sepconv2[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block6_sepconv3_a (None, 9, 9, 728) 0 ['block6_sepconv2_ Y
## ct (Activation) bn[0][0]']
## block6_sepconv3 ( (None, 9, 9, 728) 5365 ['block6_sepconv3_ Y
## SeparableConv2D) 36 act[0][0]']
## block6_sepconv3_b (None, 9, 9, 728) 2912 ['block6_sepconv3[ Y
## n (BatchNormaliza 0][0]']
## tion)
## add_4 (Add) (None, 9, 9, 728) 0 ['block6_sepconv3_ Y
## bn[0][0]',
## 'add_3[0][0]']
## block7_sepconv1_a (None, 9, 9, 728) 0 ['add_4[0][0]'] Y
## ct (Activation)
## block7_sepconv1 ( (None, 9, 9, 728) 5365 ['block7_sepconv1_ Y
## SeparableConv2D) 36 act[0][0]']
## block7_sepconv1_b (None, 9, 9, 728) 2912 ['block7_sepconv1[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block7_sepconv2_a (None, 9, 9, 728) 0 ['block7_sepconv1_ Y
## ct (Activation) bn[0][0]']
## block7_sepconv2 ( (None, 9, 9, 728) 5365 ['block7_sepconv2_ Y
## SeparableConv2D) 36 act[0][0]']
## block7_sepconv2_b (None, 9, 9, 728) 2912 ['block7_sepconv2[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block7_sepconv3_a (None, 9, 9, 728) 0 ['block7_sepconv2_ Y
## ct (Activation) bn[0][0]']
## block7_sepconv3 ( (None, 9, 9, 728) 5365 ['block7_sepconv3_ Y
## SeparableConv2D) 36 act[0][0]']
## block7_sepconv3_b (None, 9, 9, 728) 2912 ['block7_sepconv3[ Y
## n (BatchNormaliza 0][0]']
## tion)
## add_5 (Add) (None, 9, 9, 728) 0 ['block7_sepconv3_ Y
## bn[0][0]',
## 'add_4[0][0]']
## block8_sepconv1_a (None, 9, 9, 728) 0 ['add_5[0][0]'] Y
## ct (Activation)
## block8_sepconv1 ( (None, 9, 9, 728) 5365 ['block8_sepconv1_ Y
## SeparableConv2D) 36 act[0][0]']
## block8_sepconv1_b (None, 9, 9, 728) 2912 ['block8_sepconv1[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block8_sepconv2_a (None, 9, 9, 728) 0 ['block8_sepconv1_ Y
## ct (Activation) bn[0][0]']
## block8_sepconv2 ( (None, 9, 9, 728) 5365 ['block8_sepconv2_ Y
## SeparableConv2D) 36 act[0][0]']
## block8_sepconv2_b (None, 9, 9, 728) 2912 ['block8_sepconv2[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block8_sepconv3_a (None, 9, 9, 728) 0 ['block8_sepconv2_ Y
## ct (Activation) bn[0][0]']
## block8_sepconv3 ( (None, 9, 9, 728) 5365 ['block8_sepconv3_ Y
## SeparableConv2D) 36 act[0][0]']
## block8_sepconv3_b (None, 9, 9, 728) 2912 ['block8_sepconv3[ Y
## n (BatchNormaliza 0][0]']
## tion)
## add_6 (Add) (None, 9, 9, 728) 0 ['block8_sepconv3_ Y
## bn[0][0]',
## 'add_5[0][0]']
## block9_sepconv1_a (None, 9, 9, 728) 0 ['add_6[0][0]'] Y
## ct (Activation)
## block9_sepconv1 ( (None, 9, 9, 728) 5365 ['block9_sepconv1_ Y
## SeparableConv2D) 36 act[0][0]']
## block9_sepconv1_b (None, 9, 9, 728) 2912 ['block9_sepconv1[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block9_sepconv2_a (None, 9, 9, 728) 0 ['block9_sepconv1_ Y
## ct (Activation) bn[0][0]']
## block9_sepconv2 ( (None, 9, 9, 728) 5365 ['block9_sepconv2_ Y
## SeparableConv2D) 36 act[0][0]']
## block9_sepconv2_b (None, 9, 9, 728) 2912 ['block9_sepconv2[ Y
## n (BatchNormaliza 0][0]']
## tion)
## block9_sepconv3_a (None, 9, 9, 728) 0 ['block9_sepconv2_ Y
## ct (Activation) bn[0][0]']
## block9_sepconv3 ( (None, 9, 9, 728) 5365 ['block9_sepconv3_ Y
## SeparableConv2D) 36 act[0][0]']
## block9_sepconv3_b (None, 9, 9, 728) 2912 ['block9_sepconv3[ Y
## n (BatchNormaliza 0][0]']
## tion)
## add_7 (Add) (None, 9, 9, 728) 0 ['block9_sepconv3_ Y
## bn[0][0]',
## 'add_6[0][0]']
## block10_sepconv1_ (None, 9, 9, 728) 0 ['add_7[0][0]'] Y
## act (Activation)
## block10_sepconv1 (None, 9, 9, 728) 5365 ['block10_sepconv1 Y
## (SeparableConv2D) 36 _act[0][0]']
## block10_sepconv1_ (None, 9, 9, 728) 2912 ['block10_sepconv1 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block10_sepconv2_ (None, 9, 9, 728) 0 ['block10_sepconv1 Y
## act (Activation) _bn[0][0]']
## block10_sepconv2 (None, 9, 9, 728) 5365 ['block10_sepconv2 Y
## (SeparableConv2D) 36 _act[0][0]']
## block10_sepconv2_ (None, 9, 9, 728) 2912 ['block10_sepconv2 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block10_sepconv3_ (None, 9, 9, 728) 0 ['block10_sepconv2 Y
## act (Activation) _bn[0][0]']
## block10_sepconv3 (None, 9, 9, 728) 5365 ['block10_sepconv3 Y
## (SeparableConv2D) 36 _act[0][0]']
## block10_sepconv3_ (None, 9, 9, 728) 2912 ['block10_sepconv3 Y
## bn (BatchNormaliz [0][0]']
## ation)
## add_8 (Add) (None, 9, 9, 728) 0 ['block10_sepconv3 Y
## _bn[0][0]',
## 'add_7[0][0]']
## block11_sepconv1_ (None, 9, 9, 728) 0 ['add_8[0][0]'] Y
## act (Activation)
## block11_sepconv1 (None, 9, 9, 728) 5365 ['block11_sepconv1 Y
## (SeparableConv2D) 36 _act[0][0]']
## block11_sepconv1_ (None, 9, 9, 728) 2912 ['block11_sepconv1 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block11_sepconv2_ (None, 9, 9, 728) 0 ['block11_sepconv1 Y
## act (Activation) _bn[0][0]']
## block11_sepconv2 (None, 9, 9, 728) 5365 ['block11_sepconv2 Y
## (SeparableConv2D) 36 _act[0][0]']
## block11_sepconv2_ (None, 9, 9, 728) 2912 ['block11_sepconv2 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block11_sepconv3_ (None, 9, 9, 728) 0 ['block11_sepconv2 Y
## act (Activation) _bn[0][0]']
## block11_sepconv3 (None, 9, 9, 728) 5365 ['block11_sepconv3 Y
## (SeparableConv2D) 36 _act[0][0]']
## block11_sepconv3_ (None, 9, 9, 728) 2912 ['block11_sepconv3 Y
## bn (BatchNormaliz [0][0]']
## ation)
## add_9 (Add) (None, 9, 9, 728) 0 ['block11_sepconv3 Y
## _bn[0][0]',
## 'add_8[0][0]']
## block12_sepconv1_ (None, 9, 9, 728) 0 ['add_9[0][0]'] Y
## act (Activation)
## block12_sepconv1 (None, 9, 9, 728) 5365 ['block12_sepconv1 Y
## (SeparableConv2D) 36 _act[0][0]']
## block12_sepconv1_ (None, 9, 9, 728) 2912 ['block12_sepconv1 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block12_sepconv2_ (None, 9, 9, 728) 0 ['block12_sepconv1 Y
## act (Activation) _bn[0][0]']
## block12_sepconv2 (None, 9, 9, 728) 5365 ['block12_sepconv2 Y
## (SeparableConv2D) 36 _act[0][0]']
## block12_sepconv2_ (None, 9, 9, 728) 2912 ['block12_sepconv2 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block12_sepconv3_ (None, 9, 9, 728) 0 ['block12_sepconv2 Y
## act (Activation) _bn[0][0]']
## block12_sepconv3 (None, 9, 9, 728) 5365 ['block12_sepconv3 Y
## (SeparableConv2D) 36 _act[0][0]']
## block12_sepconv3_ (None, 9, 9, 728) 2912 ['block12_sepconv3 Y
## bn (BatchNormaliz [0][0]']
## ation)
## add_10 (Add) (None, 9, 9, 728) 0 ['block12_sepconv3 Y
## _bn[0][0]',
## 'add_9[0][0]']
## block13_sepconv1_ (None, 9, 9, 728) 0 ['add_10[0][0]'] Y
## act (Activation)
## block13_sepconv1 (None, 9, 9, 728) 5365 ['block13_sepconv1 Y
## (SeparableConv2D) 36 _act[0][0]']
## block13_sepconv1_ (None, 9, 9, 728) 2912 ['block13_sepconv1 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block13_sepconv2_ (None, 9, 9, 728) 0 ['block13_sepconv1 Y
## act (Activation) _bn[0][0]']
## block13_sepconv2 (None, 9, 9, 1024) 7520 ['block13_sepconv2 Y
## (SeparableConv2D) 24 _act[0][0]']
## block13_sepconv2_ (None, 9, 9, 1024) 4096 ['block13_sepconv2 Y
## bn (BatchNormaliz [0][0]']
## ation)
## conv2d_11 (Conv2D (None, 5, 5, 1024) 7454 ['add_10[0][0]'] Y
## ) 72
## block13_pool (Max (None, 5, 5, 1024) 0 ['block13_sepconv2 Y
## Pooling2D) _bn[0][0]']
## batch_normalizati (None, 5, 5, 1024) 4096 ['conv2d_11[0][0]' Y
## on_3 (BatchNormal ]
## ization)
## add_11 (Add) (None, 5, 5, 1024) 0 ['block13_pool[0][ Y
## 0]',
## 'batch_normalizat
## ion_3[0][0]']
## block14_sepconv1 (None, 5, 5, 1536) 1582 ['add_11[0][0]'] Y
## (SeparableConv2D) 080
## block14_sepconv1_ (None, 5, 5, 1536) 6144 ['block14_sepconv1 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block14_sepconv1_ (None, 5, 5, 1536) 0 ['block14_sepconv1 Y
## act (Activation) _bn[0][0]']
## block14_sepconv2 (None, 5, 5, 2048) 3159 ['block14_sepconv1 Y
## (SeparableConv2D) 552 _act[0][0]']
## block14_sepconv2_ (None, 5, 5, 2048) 8192 ['block14_sepconv2 Y
## bn (BatchNormaliz [0][0]']
## ation)
## block14_sepconv2_ (None, 5, 5, 2048) 0 ['block14_sepconv2 Y
## act (Activation) _bn[0][0]']
## ================================================================================
## Total params: 20861480 (79.58 MB)
## Trainable params: 20806952 (79.37 MB)
## Non-trainable params: 54528 (213.00 KB)
## ________________________________________________________________________________
Key characteristics of the Xception model include:
Depthwise Separable Convolutions: Xception extensively uses depthwise separable convolutions, which consist of a depthwise convolution (spatial filtering) followed by a pointwise convolution (cross-channel filtering). This separation reduces the computational cost significantly while maintaining expressive power.
Feature Cross Channels: Xception enhances the information flow across channels, promoting more efficient learning of hierarchical features.
Fully Convolutional: The model is fully convolutional, meaning it lacks dense layers. This design choice contributes to a more uniform representation of spatial hierarchies throughout the network.
Skip Connections: Xception incorporates skip connections to facilitate the flow of gradients during training, promoting better convergence and preventing the vanishing gradient problem.
Global Average Pooling: Instead of using fully connected layers, Xception employs global average pooling at the end of the network. This helps reduce overfitting and produces a fixed-size output regardless of input size.
# Freeze the base_model
base_model$layers[[1]]$trainable <- FALSE
# Create new model on top
inputs <- layer_input(shape = c(150, 150, 3))
x <- inputs
# Pre-trained Xception weights require input scaling
scale_layer <- layer_rescaling(scale = 1 / 127.5, offset = -1)
x <- scale_layer(x)
# The base model contains batchnorm layers. I want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the base_model is running in inference mode here.
x <- base_model(x, training = FALSE)
x <- layer_global_average_pooling_2d()(x)
x <- layer_dropout(x, rate = 0.2) # Regularize with dropout
outputs <- layer_dense(units = 1)(x)
model.xception <- keras_model(inputs, outputs)
# Compile the model.xception
model.xception %>% compile(
optimizer = optimizer_adam(),
loss = loss_binary_crossentropy(from_logits = TRUE),
metrics = c(metric_binary_accuracy())
)
# Fitting the top layer of the model.xception
epochs <- 5
history.xception.top=model.xception %>% fit(train_generator,
epochs = epochs,
validation_data = validation_generator)
history.xception.top %>%
saveRDS(here("output/history_xception_top.rds"))
model.xception %>%
save_model_hdf5(here("output/faces_xception_top.h5"))## Model: "model"
## ________________________________________________________________________________
## Layer (type) Output Shape Param # Trainable
## ================================================================================
## input_2 (InputLayer) [(None, 150, 150, 3)] 0 Y
## rescaling (Rescaling) (None, 150, 150, 3) 0 Y
## xception (Functional) (None, 5, 5, 2048) 20861480 Y
## global_average_pooling2d (Gl (None, 2048) 0 Y
## obalAveragePooling2D)
## dropout (Dropout) (None, 2048) 0 Y
## dense (Dense) (None, 1) 2049 Y
## ================================================================================
## Total params: 20863529 (79.59 MB)
## Trainable params: 20809001 (79.38 MB)
## Non-trainable params: 54528 (213.00 KB)
## ________________________________________________________________________________
# Allow base_model to be trainable
base_model$layers[[1]]$trainable <- TRUE
# Compile the model with a lower learning rate
model.xception %>% compile(
optimizer = optimizer_adam(learning_rate = 1e-5),
loss = loss_binary_crossentropy(from_logits = TRUE),
metrics = c(metric_binary_accuracy())
)
# Fitting the end-to-end model
epochs <- 20
history.xception.finnal=model.xception %>%
fit(train_generator,
epochs = epochs,
validation_data = validation_generator)
model.xception %>%save_model_hdf5(here("output/faces_model_finanl_xception.h5"))
history.xception.finnal %>% saveRDS(here("output/history_xception_final.rds"))model.xception.finnal <- load_model_hdf5(here("output/faces_model_finanl_xception.h5"))
history.xception.finnal <- readRDS(here("output/history_xception_final.rds"))
plot(history.xception.finnal)Xception model with only 20 epochs can increase the validation accuracy from CNN model 91.4%. to 93.1%.
Then, Using xception model to evaluate the test data
## 116/116 - 121s - loss: 0.1698 - binary_accuracy: 0.9319 - 121s/epoch - 1s/step
## [1] "Xception Model with 20 epochs for Test data accuracy: 93.19 %"
Final test data accuracy is 93.19%.
The Xception model is overly complex, resulting in lengthy runtimes, approximately one and half an hour per epoch. Due to time constraints, I limited the training to 20 epochs. To balance computational efficiency and predictive performance, I aim to construct a more intricate model. This will allow for a thorough exploration of its capabilities within a reasonable time frame, potentially enhancing predictive accuracy.
I plan to develop a sophisticated CNN model with six convolutional layers, featuring filters in the sequence of 32-64-128-256-512-512. This expanded architecture aims to capture more intricate patterns in the data compared to previous 4 CNN models.
model.extend = keras_model_sequential() %>%
layer_conv_2d(
filters = 32,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = c(150,150,3)
) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_dropout(0.25) %>% #add dropout
layer_conv_2d(filters = 128, kernel_size = c(3,3),activation = "relu") %>%
layer_conv_2d(filters = 256, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_dropout(0.25) %>% #add dropout
layer_conv_2d(filters = 512, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_dropout(0.25) %>% #add dropout
layer_conv_2d(filters = 512, kernel_size = c(3,3),activation = "relu") %>%
layer_max_pooling_2d(c(2,2)) %>%
layer_dropout(0.25) %>% #add dropout
layer_flatten() %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid") %>%
compile(
optimizer=optimizer_rmsprop(learning_rate=1e-4),
loss="binary_crossentropy",
metrics="acc"
)
summary(model.extend)## Model: "sequential_2"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_17 (Conv2D) (None, 150, 150, 32) 896
## conv2d_16 (Conv2D) (None, 148, 148, 64) 18496
## max_pooling2d_11 (MaxPooling2D) (None, 74, 74, 64) 0
## dropout_5 (Dropout) (None, 74, 74, 64) 0
## conv2d_15 (Conv2D) (None, 72, 72, 128) 73856
## conv2d_14 (Conv2D) (None, 70, 70, 256) 295168
## max_pooling2d_10 (MaxPooling2D) (None, 35, 35, 256) 0
## dropout_4 (Dropout) (None, 35, 35, 256) 0
## conv2d_13 (Conv2D) (None, 33, 33, 512) 1180160
## max_pooling2d_9 (MaxPooling2D) (None, 16, 16, 512) 0
## dropout_3 (Dropout) (None, 16, 16, 512) 0
## conv2d_12 (Conv2D) (None, 14, 14, 512) 2359808
## max_pooling2d_8 (MaxPooling2D) (None, 7, 7, 512) 0
## dropout_2 (Dropout) (None, 7, 7, 512) 0
## flatten_2 (Flatten) (None, 25088) 0
## dense_5 (Dense) (None, 512) 12845568
## dense_4 (Dense) (None, 1) 513
## ================================================================================
## Total params: 16774465 (63.99 MB)
## Trainable params: 16774465 (63.99 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
model6=model.extend
history6 <-
model6 %>%
fit_generator(
train_generator,
steps_per_epoch = 200,
epochs = 100,
validation_data = validation_generator,
validation_steps = 50
)
model6 %>% save_model_hdf5(here("output/faces_model8.h5"))
history6 %>% saveRDS(here("output/faces_model1_history8.rds"))#loading from files which were prepared earlier
model6 <- load_model_hdf5(here("output/faces_model8.h5"))
history6 <- readRDS(here("output/faces_model1_history8.rds"))
plot(history6)After implementing a more intricate model (Model 6), the validation accuracy showed improvement from 91.4%(Model 4) to 92.5%.
## 116/116 - 172s - loss: 12.3619 - acc: 0.9359 - 172s/epoch - 1s/step
## [1] "optimal CNN Model with 100 epochs for Test data accuracy: 93.59 %"
Model 6 with 100 epochs for Test data accuracy: 93.59%, which is the best now.
Therefore, Model 6 is considered the best-performing model in terms of predictive accuracy for the given test data with accuracy of 93.59%.