de-novo transcriptome assembly development report 2

1: all-parameter combinatorial sweep in SOAPDenovoTrans with 100k read pairs

100k read paired were subsampled from trimmed, normalised, paired 100bp rice Illumina reads. A class, ParameterSweeper, was written for the Bioinformatic Optimisation System (BIOPSY) and used to generate a full combinatorial parameter sweep for SOAPDenovoTrans.

13,975 assemblies were conducted by varying 7 parameters with the ranges shown below.

For each assembly, basic metrics were collected, both during the run (e.g. time to assemble) and post-assembly (N50, total length, longest contig, shortest contig, number of contigs, size of gzipped assembly). More advanced metrics are currently processing for the full set of assemblies (~5000/13975 completed as of 10:00 Friday 26th July).

Relationships between basic metrics

Pairwise

An all-vs-all plot of the collected data shows pairwise relationships between them:

## Loading required package: reshape
## Loading required package: plyr
## Attaching package: 'reshape'
## The following object is masked from 'package:plyr':
## 
## rename, round_any
## Warning: replacing previous import 'rename' when loading 'reshape'
## Warning: replacing previous import 'round_any' when loading 'reshape'

plot of chunk unnamed-chunk-1

## This function is deprecated. For a replacement, see the ggpairs function
## in the GGally package. (Deprecated; last used in version 0.9.2)

plot of chunk unnamed-chunk-1

Expected linear correlations are evident between number of contigs (numSeqs), total length (totLen) and gz_size. More complex relationships are implied between other pairs, for example:

Sets of three

We can plot three metrics together to see three-way relationships.

plot of chunk unnamed-chunk-2

Relationships between parameters and metrics

To understand how these metrics are affected by different parameters, we can assign one or more parameters to graphical dimensions including colour and symbol.

Objective function-related metrics

objective fucntion code: