Last weeks training run was terminated approximately 1/3rd the way through. The current hypothesis is that the network is too dense for a sample as homogeneous as ours and so there is pretty significant overfitting. We notice a few important things here:
Here are the summary plots for FID and PPL that show the training trajectory.
FID Fréchet Inception DistancePPL W-end Perceptual path Length in W-late spaceIn this document I summarize some of the ‘next steps’ we can take to increase the models training performance. These strategies are primarily informed by the GAN’s trajectory on key indicators like the FID and PPL-W end which were computed throughout the first training run. We split these potential solutions into three sections:
More comprehensive data cleaning
Network configuration
Internal image augmentation
External image augmentation
Each of these strategies will be explained in broad terms with some potential python solutions that should make for pretty easy implementation.
We need to ensure that all images in our data are actual faces. Running a face-detection scrip identifies 125 potential non-face images which we should delete from our training data.
we can delete 125 images which have a high likelihood of not containing actual facesBesides ensuring completely clean data we can also reduce the density of the network configuration which should both reduce the possibility for overfitting and reduce total training time. Essentially a lower number of activated nodes will reduce the networks ability to remember the data.
F-full density networkThese are pretty standard strategies to increase the variance of the input images which should aid the regularization of the network and reduce potential overfitting effects which we believe are the root cause for sub-par training convergence.
PIL.invert(...) library in Python we can invert a randomly samples subset of our image setRBG space that the GAN sees in the training set by inverting the RGB values of each image as seen below:
PIL.flip(...) and PIL.mirror(...) we can take a subset of our mugshots and invert their orientationSobel filter to generate a set of edge-enhanced images which essentially just correspond to a composition of vertical and horizontal edges from the original mugshot.\(~\) \(~\) \(~\) \(~\) \(~\)