To plot the cluster cluster counts in cool ways with ggplot I think we’re going to want the data to be in “long” form in a shared data frame. That will let us do things like color and group and facet based on lots of different features.
Currently the cluster counts come back as lists, like below:
cluster_count1 = c(5,7,11,11,11,13,15,15,27,29,42,41,40,49,46,33,28,31,34,31,27,30,31,31,27,23,20,27,24,29,20,21,26,23,23,22,23,21,20,29,23,21,17,20,25,23,24,23,27,24,16,27,22,25,19,22,28,20,23,22,22,22,23,20,23,27,25,25,23,26,19,28,24,23,29,24,22,25,21,21,18,21,20,23,21,20,23,22,24,28,34,28,27,25,29,26,26,20,23,25,28,25,30,26,23,26,34,29,25,21,29,29,25,30,31,22,39,34,33,35,30,27,23,18,29,27,24,26,22,17)
data6_clustering = c()
data6_clustering$count_at_height20 = c(7,8,6,8,11,20,28,32,30,53,73,93,92,98,102,108,113,119,107,106,119,114,115,130,122,127,132,133,135,132,159,140,148,148,139,140,145,145,135,159,148,124,118,106,72,62,53,60,47,56,57,47,46,57,61,57,59,54,58,51,52,50,58,57,58,64,58,51,59,65,60,60,61,70,67,60,56,47,64,66,61,69,92,88,62,69,83,60)
data6_clustering$count_at_height40 = c(2,2,2,2,2,4,5,6,3,4,9,9,13,16,11,14,15,14,13,10,10,7,8,12,8,13,13,12,11,11,9,9,9,12,13,11,13,11,14,18,19,17,20,22,18,12,8,15,8,10,11,10,10,9,10,10,10,8,8,7,9,6,11,12,13,11,9,7,12,11,11,9,10,13,10,11,9,8,15,15,10,10,21,18,15,14,22,22)
The following function converts one of the above lists into a data.frame. It also adds a number of pieces of datat that may help us with the grouping later:
Tom: Are there other fields that you think we should be including?
make_frame_from_counts <- function (run_number, problem, treatment,
succeeded, height, counts) {
num_gens = length(counts)
result = data.frame(run.num = rep(run_number, num_gens),
problem = rep(problem, num_gens),
treatment = rep(treatment, num_gens),
succeeded = rep(succeeded, num_gens),
height = rep(height, num_gens),
generation = seq(0, num_gens-1),
cluster.count = counts)
return(result)
}
Now that we can convert the data into data.frames, we can use ggplot to make some nice plots.
library("ggplot2")
d1_h20 <- make_frame_from_counts(1, "RSWN", "lexicase", TRUE, 20, cluster_count1)
ggplot(d1_h20, aes(x=generation, y=cluster.count)) + geom_line()
d6_h20 <- make_frame_from_counts(6, "RSWN", "lexicase", TRUE, 20,
data6_clustering$count_at_height20)
ggplot(d6_h20, aes(x=generation, y=cluster.count)) + geom_line()
d6_h40 <- make_frame_from_counts(6, "RSWN", "lexicase", TRUE, 40,
data6_clustering$count_at_height40)
ggplot(d6_h40, aes(x=generation, y=cluster.count)) + geom_line()
Now that everything is in nice data.frames we can rbind them all together into a single big frame and plot them in interesting ways.
all <- rbind(d1_h20, d6_h20, d6_h40)
ggplot(all, aes(x=generation, y=cluster.count,
group=interaction(run.num, height),
color=interaction(run.num, factor(height)))) +
geom_line() +
labs(color = "Run . Height")
This combined plot makes it much easier to compare the cluster counts from different runs. We can see the relative heights of the cluster counts, for example, and when different runs end.
It will be interesting when we have these counts from hundreds of runs from different problems, selection mechanisms, and the the like.