Linear lineage plots

Setup

These are all from a successful UMAD run (log1) on the Syllables problem in the data collection for the GECCO 2018 paper introducing UMAD and its variants.

Total error over time

semantics <- read_csv("~/Downloads/GECCO18/semantics.csv")

## Parsed with column specification:
## cols(
##   Generation = col_integer(),
##   Total.error = col_integer(),
##   Semantics.UUID = col_character()
## )

min_total_error <- read_csv("~/Downloads/GECCO18/min_total_error.csv")

## Parsed with column specification:
## cols(
##   Generation = col_integer(),
##   Min_total_error = col_integer()
## )

column_version <- cbind(semantics, 
                        min_total_error$Min_total_error)
names(semantics) <- c("Generation", "Total_error", "Semantics.UUID")
names(min_total_error) <- c("Generation", "Total_error")
row_version <- combine(semantics[, c(1, 2)], min_total_error, 
                       names=c("Lineage total error", "Min total error"))

Total error

The blue line shows the total error across a successful lineage from the initial generation to the success in generation 166. The red line shows the lowest total error across the entire population over the run. We can see that the lineage that led to (the earliest) success frequently chose individuals that weren’t the “best” based on total error, sometimes by quite a lot. There are times, however, where the two lines overlap, indicating that the individuals in this lineage were the “best” (based on total error) in their generation.

The initial total error for the lineage is over 100K, so I chopped off the top of the graph.

Zooming in

This is zooming in on generations 40-80 in the graph above. The colored dots indicate semantics, although not perfectly. These are just R’s default color mappings, and there are dots with very similar (or maybe identical?) colors that actually have different semantics. We can see, however, that there are horizontal stretches that all have the same color, indicating that the semantics aren’t changing for several generations.

Shawn and I were thinking of two additions to this that we haven’t gotten to yet:

Adding the best total error line to this graph as well for reference
Adding some indication of how many times each of these individuals was selected. Not entirely sure what that will look like. Size of the dot? A number floating above the dot? A thin, partially transparent bar chart behind the line plot?

Change in instruction use over time

Classified_instruction_counts <- read.csv("~/Downloads/GECCO18/Classified_instruction_counts.csv", sep="")
Classified_instruction_counts$Kind <- 
  factor(Classified_instruction_counts$prefix_factor, 
         levels=c("Constant", "input", "print", "boolean", "integer", 
                  "char", "string", "tags", "exec"))
ordered_counts <- 
  Classified_instruction_counts %>% 
  arrange(Kind, Instruction) %>% 
  mutate(Instruction = factor(Instruction, unique(Instruction)))

ggplot(ordered_counts, aes(x=Generation, y=Instruction, group=Kind)) + 
  geom_point(aes(size=Count, color=Kind), alpha=0.75) + theme_bw() + 
  theme(axis.text.y=element_blank(), axis.ticks.y = element_blank()) + 
  scale_color_brewer(palette="Set1")

Change in test case errors over time

errors <- read.csv("~/Downloads/GECCO18/errors.csv", sep="")
wide <- spread(errors, Test_case, Error)
my_palette <- colorRampPalette(c("#1a9641", "#ffffbf", "#d7191c"))(n=26)
col_breaks = c(seq(-1, 0, length=1), 
               seq(0.0001, 0.9, length=19), 
               seq(0.9001, 7, length=7))

pheatmap(t(log(as.matrix(wide[,-1])+1)), 
         Rowv=NA, scale="none", col=my_palette, 
         show_colnames = FALSE, show_rownames = FALSE, 
         main="Errors (ln) over time", 
         cluster_cols = FALSE, 
         breaks = col_breaks, border_color=NA, 
         xlab="Generation")