100k read paired were subsampled from trimmed, normalised, paired 100bp rice Illumina reads. A class, ParameterSweeper, was written for the Bioinformatic Optimisation System (BIOPSY) and used to generate a full combinatorial parameter sweep for SOAPDenovoTrans.
13,975 assemblies were conducted by varying 7 parameters with the ranges shown below.
For each assembly, basic metrics were collected, both during the run (e.g. time to assemble) and post-assembly (N50, total length, longest contig, shortest contig, number of contigs, size of gzipped assembly). More advanced metrics are currently processing for the full set of assemblies (~5000/13975 completed as of 10:00 Friday 26th July).
An all-vs-all plot of the collected data shows pairwise relationships between them:
## Loading required package: reshape
## Loading required package: plyr
## Attaching package: 'reshape'
## The following object is masked from 'package:plyr':
##
## rename, round_any
## Warning: replacing previous import 'rename' when loading 'reshape'
## Warning: replacing previous import 'round_any' when loading 'reshape'
## This function is deprecated. For a replacement, see the ggpairs function
## in the GGally package. (Deprecated; last used in version 0.9.2)
Expected linear correlations are evident between number of contigs (numSeqs), total length (totLen) and gz_size. More complex relationships are implied between other pairs, for example:
We can plot three metrics together to see three-way relationships.
To understand how these metrics are affected by different parameters, we can assign one or more parameters to graphical dimensions including colour and symbol.
objective fucntion code: