StyleGAN2 - Augmentation Strategies

Discussion Required:

Since terminating the first GAN training run, we need to decide how to proceede
I would appreciate some input on potential next steps
- What kind of data-processing makes sense
- Can we apply standard techniques to the GAN
- How does this impact the overall timeline

Quick summary of GAN training run No.1

Last weeks training run was terminated approximately 1/3rd the way through. The current hypothesis is that the network is too dense for a sample as homogeneous as ours and so there is pretty significant overfitting. We notice a few important things here:

FID is falling very quickly, much faster than the original StyleGAN2 examples trained on FFHQ
Flattens out equally quickly
PPL struggles to stabilize after initial spike

Here are the summary plots for FID and PPL that show the training trajectory.

FID Fréchet Inception Distance

PPL W-end Perceptual path Length in W-late space

In this document I summarize some of the ‘next steps’ we can take to increase the models training performance. These strategies are primarily informed by the GAN’s trajectory on key indicators like the FID and PPL-W end which were computed throughout the first training run. We split these potential solutions into three sections:

More comprehensive data cleaning
- Deletion of potential non-face images using face-detection
Network configuration
- Reducing network density configuration
Internal image augmentation
- Image inversion/mirroring on a subset
- Edge detection filter
- Contrast enhancement
External image augmentation
- FFHQ (512x512) inclusion

Each of these strategies will be explained in broad terms with some potential python solutions that should make for pretty easy implementation.

No. 1 - Comprehensive Data Cleaning and other prelims

We need to ensure that all images in our data are actual faces. Running a face-detection scrip identifies 125 potential non-face images which we should delete from our training data.

Deletion of potential non-face images using face-detection
- Using the same face detection network used by Logan to crop the mugshots we can delete 125 images which have a high likelihood of not containing actual faces
- Potentially reduces “white spot” artifacts in FID spikes

No. 2 - Network Configuration

Besides ensuring completely clean data we can also reduce the density of the network configuration which should both reduce the possibility for overfitting and reduce total training time. Essentially a lower number of activated nodes will reduce the networks ability to remember the data.

Reducing network density configuration
- Reduce the network configuration to something lower than F-full density network
- Lower total value of parameters reduces overfitting
- Should decrease total training time (need to figure out by how much)

No. 3 - Internal Image Augmentation

These are pretty standard strategies to increase the variance of the input images which should aid the regularization of the network and reduce potential overfitting effects which we believe are the root cause for sub-par training convergence.

1. Image color inversion on a subset

Using the PIL.invert(...) library in Python we can invert a randomly samples subset of our image set
All our mugshots have almost exactly the same color scheme so this is a pretty convincing candidate for potential overfitting
This will reduce the homogeneity in the RBG space that the GAN sees in the training set by inverting the RGB values of each image as seen below: \(~\)

2. Image orientation inversion on a subset

Again, using PIL.flip(...) and PIL.mirror(...) we can take a subset of our mugshots and invert their orientation
As with the previous point, here we want to decrease the ability of the GAN to overfit on orientation
Flip a subset horizontally, at 45 degrees, and vertically like seen below: \(~\)

3. Edge detection filter

Here we can use a standard Sobel filter to generate a set of edge-enhanced images which essentially just correspond to a composition of vertical and horizontal edges from the original mugshot.
The GAN notably stuggled with picking up edges on the mugshots, so this may aid in it learning those features.
As an example, this should produce an image that looks something like this: \(~\)

No.4 - External Image Augmentatione

1. FFHQ (512x512) inclusion

Include other non-mugshot images to increase the variance of the input data
Potential problems with image editing
How to we constrain our space ?

2. Additional mugshot scraping

To reduce potential problems with face editing, we could scrape more mugshot images
Increases in sample size reduce the ability of the GAN to overfit and memorize

Point for Discussion

All of these data processing techinques expand the space the GAN will learn.
As such, the GAN will be able to generate mugshots that change orientation
- We will need to find a way to constrain this when editing faces
- We want to keep some key characteristics constant
  - Color scheme
  - Face orientation
  - Age/Demographics (if including non-mugshots)

\(~\) \(~\) \(~\) \(~\) \(~\)