Reference, Resources and Data

Nature Protocols volume 8, pages 1494-1512 (2013)

Machine: CentOS Linux 7 on Virtual Box

Starting directory and subdirectory contents

Note that the S_pombe_refTrans.fasta and samples_n_reads_described.txt won’t be used for de novo assembly.

Now the contents are in different working working directory: trinity2

trinity2$ ls -l

-rw-rw-r– 1 ubuntu ubuntu 703222969 Mar 4 23:53 ALL.LEFT.fq

-rw-rw-r– 1 ubuntu ubuntu 703222969 Mar 4 23:54 ALL.RIGHT.fq

-rw-rw-r– 1 ubuntu ubuntu 175846179 Feb 6 2013 Sp.ds.1M.left.fq

-rw-rw-r– 1 ubuntu ubuntu 175846179 Feb 6 2013 Sp.ds.1M.right.fq

-rw-rw-r– 1 ubuntu ubuntu 175736042 Feb 6 2013 Sp.hs.1M.left.fq

-rw-rw-r– 1 ubuntu ubuntu 175736042 Feb 6 2013 Sp.hs.1M.right.fq

-rw-rw-r– 1 ubuntu ubuntu 175741215 Feb 6 2013 Sp.log.1M.left.fq

-rw-rw-r– 1 ubuntu ubuntu 175741215 Feb 6 2013 Sp.log.1M.right.fq

-rw-rw-r– 1 ubuntu ubuntu 175899533 Feb 6 2013 Sp.plat.1M.left.fq

-rw-rw-r– 1 ubuntu ubuntu 175899533 Feb 6 2013 Sp.plat.1M.right.fq

trinity2$ ls -l trinity_out_dir/

-rw-rw-r– 1 ubuntu ubuntu 9882912 Mar 24 21:09 Trinity.fasta

-rw-rw-r– 1 ubuntu ubuntu 414104 Mar 24 21:09 Trinity.fasta.gene_trans_map

-rw-rw-r– 1 ubuntu ubuntu 656 Mar 24 21:09 Trinity.timing

-rw-rw-r– 1 ubuntu ubuntu 411748816 Mar 24 18:18 both.fa

-rw-rw-r– 1 ubuntu ubuntu 0 Mar 24 18:18 both.fa.ok

-rw-rw-r– 1 ubuntu ubuntu 8 Mar 24 18:18 both.fa.read_count

drwxrwxr-x 2 ubuntu ubuntu 4096 Mar 24 18:29 chrysalis

-rw-rw-r– 1 ubuntu ubuntu 15101793 Mar 24 18:23 inchworm.K25.L25.fa

-rw-rw-r– 1 ubuntu ubuntu 0 Mar 24 18:23 inchworm.K25.L25.fa.finished

-rw-rw-r– 1 ubuntu ubuntu 9 Mar 24 18:20 inchworm.kmer_count

drwxrwxr-x 2 ubuntu ubuntu 4096 Mar 24 18:18 insilico_read_normalization

-rw-rw-r– 1 ubuntu ubuntu 971810373 Mar 24 18:19 jellyfish.kmers.fa

-rw-rw-r– 1 ubuntu ubuntu 1863 Mar 24 18:19 jellyfish.kmers.fa.histo

-rw-rw-r– 1 ubuntu ubuntu 0 Mar 24 18:18 left.fa.ok

-rw-rw-r– 1 ubuntu ubuntu 830570 Mar 24 18:29 partitioned_reads.files.list

-rw-rw-r– 1 ubuntu ubuntu 0 Mar 24 18:29 partitioned_reads.files.list.ok

-rw-rw-r– 1 ubuntu ubuntu 2134 Mar 24 18:29 pipeliner.10579.cmds

-rw-rw-r– 1 ubuntu ubuntu 2134 Mar 24 20:54 pipeliner.16726.cmds

drwxrwxr-x 3 ubuntu ubuntu 4096 Mar 24 18:29 read_partitions

-rw-rw-r– 1 ubuntu ubuntu 2944084 Mar 24 18:29 recursive_trinity.cmds

-rw-rw-r– 1 ubuntu ubuntu 2944084 Mar 24 21:09 recursive_trinity.cmds.completed

-rw-rw-r– 1 ubuntu ubuntu 0 Mar 24 18:29 recursive_trinity.cmds.ok

-rw-rw-r– 1 ubuntu ubuntu 0 Mar 24 18:18 right.fa.ok

-rw-rw-r– 1 ubuntu ubuntu 1042464724 Mar 24 18:26 scaffolding_entries.sam

Align and estimate abundance

Running built-in trtinity PERL scripts (using CentOS 7 and working directory: trinity2) to map original reads to the transcripts and use RSEM to do abundance estimates for newly assembled transcripts

trinity2$ align_and_estimate_abundance.pl –prep_reference –seqType fq –est_method RSEM –aln_method bowtie –left Sp.ds.left.fq.gz –right Sp.ds.right.fq.gz –transcripts trinity_out_dir/Trinity.fasta –output_dir Sp_ds

some log readouts

CMD: touch /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie.started

CMD: bowtie-build /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie

Settings:

Output files: "/home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie.*.ebwt"

Line rate: 6 (line is 64 bytes)

Lines per side: 1 (side is 64 bytes)

Offset rate: 5 (one in 32)

FTable chars: 10

Strings: unpacked

Max bucket size: default

Max bucket size, sqrt multiplier: default

Max bucket size, len divisor: 4

Difference-cover sample period: 1024

Endianness: little

Actual local endianness: little

Sanity checking: disabled

Assertions: disabled

Random seed: 0

Sizeofs: void*:8, int:4, long:8, size_t:8

Input files DNA, FASTA:

/home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta

Reading reference sizes Time reading reference sizes: 00:00:01

Calculating joined length Writing header Reserving space for joined string

Joining reference sequences Time to join reference sequences: 00:00:00 bmax according to bmaxDivN setting: 2347743

Using parameters –bmax 1760808 –dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: –bmax 1760808 –dcv 1024

Constructing suffix-array element generator

Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples V-Sorting samples time: 00:00:00 Allocating rank array Ranking v-sort output Ranking v-sort output time: 00:00:01 Invoking Larsson-Sadakane on ranks Invoking Larsson-Sadakane on ranks time: 00:00:00 Sanity-checking and returning

Building samples Reserving space for 12 sample suffixes Generating random suffixes QSorting 12 sample offsets, eliminating duplicates QSorting sample offsets, eliminating duplicates time: 00:00:00

Multikey QSorting 12 samples (Using difference cover) Multikey QSorting samples time: 00:00:00

Calculating bucket sizes

Splitting and merging Splitting and merging time: 00:00:00

Avg bucket size: 9.39098e+06 (target: 1760807) Converting suffix-array elements to index image

Allocating ftab, absorbFtab

Entering Ebwt loop

Getting block 1 of 1 No samples; assembling all-inclusive block Sorting block of length 9390975 for bucket 1 (Using difference cover) Sorting block time: 00:00:03

Returning block of 9390976 for bucket 1

Exited Ebwt loop

fchr[A]: 0 fchr[C]: 2781883 fchr[G]: 4577758 fchr[T]: 6357694 fchr[$]: 9390975

Exiting Ebwt::buildToDisk()

Returning from initFromVector

Wrote 7499008 bytes to primary EBWT file: /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie.1.ebwt

Wrote 1173876 bytes to secondary EBWT file: /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie.2.ebwt

Re-opening _in1 and _in2 as input streams Returning from Ebwt constructor

Headers: len: 9390975 bwtLen: 9390976 sz: 2347744 bwtSz: 2347744 lineRate: 6 linesPerSide: 1 offRate: 5 offMask: 0xffffffe0 isaRate: -1 isaMask: 0xffffffff ftabChars: 10 eftabLen: 20 eftabSz: 80 ftabLen: 1048577 ftabSz: 4194308 offsLen: 293468 offsSz: 1173872 isaLen: 0 isaSz: 0 lineSz: 64 sideSz: 64 sideBwtSz: 56 sideBwtLen: 224 numSidePairs: 20962 numSides: 41924 numLines: 41924 ebwtTotLen: 2683136 ebwtTotSz: 2683136 reverse: 0

Total time for call to driver() for forward index: 00:00:06

Reading reference sizes Time reading reference sizes: 00:00:00

Calculating joined length Writing header Reserving space for joined string

Joining reference sequences Time to join reference sequences: 00:00:01

bmax according to bmaxDivN setting: 2347743

Using parameters –bmax 1760808 –dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: –bmax 1760808 –dcv 1024

Constructing suffix-array element generator

Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples V-Sorting samples time: 00:00:00 Allocating rank array Ranking v-sort output Ranking v-sort output time: 00:00:00 Invoking Larsson-Sadakane on ranks Invoking Larsson-Sadakane on ranks time: 00:00:00 Sanity-checking and returning

Building samples Reserving space for 12 sample suffixes Generating random suffixes QSorting 12 sample offsets, eliminating duplicates QSorting sample offsets, eliminating duplicates time: 00:00:00

Multikey QSorting 12 samples (Using difference cover) Multikey QSorting samples time: 00:00:00

Calculating bucket sizes

Splitting and merging Splitting and merging time: 00:00:00 Avg bucket size: 9.39098e+06 (target: 1760807)

Converting suffix-array elements to index image Allocating ftab, absorbFtab Entering Ebwt loop

Getting block 1 of 1 No samples; assembling all-inclusive block Sorting block of length 9390975 for bucket 1 (Using difference cover) Sorting block time: 00:00:03

Returning block of 9390976 for bucket 1

Exited Ebwt loop

fchr[A]: 0 fchr[C]: 2781883 fchr[G]: 4577758 fchr[T]: 6357694 fchr[$]: 9390975 Exiting Ebwt::buildToDisk() Returning from initFromVector

Wrote 7499008 bytes to primary EBWT file: /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie.rev.1.ebwt

Wrote 1173876 bytes to secondary EBWT file: /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie.rev.2.ebwt

Re-opening _in1 and _in2 as input streams Returning from Ebwt constructor

Headers: len: 9390975 bwtLen: 9390976 sz: 2347744 bwtSz: 2347744 lineRate: 6 linesPerSide: 1 offRate: 5 offMask: 0xffffffe0 isaRate: -1 isaMask: 0xffffffff ftabChars: 10 eftabLen: 20 eftabSz: 80 ftabLen: 1048577 ftabSz: 4194308 offsLen: 293468 offsSz: 1173872 isaLen: 0 isaSz: 0 lineSz: 64 sideSz: 64 sideBwtSz: 56 sideBwtLen: 224 numSidePairs: 20962 numSides: 41924 numLines: 41924 ebwtTotLen: 2683136 ebwtTotSz: 2683136 reverse: 0

Total time for backward call to driver() for mirror index: 00:00:06

CMD: touch /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM.rsem.prepped.started

CMD: rsem-prepare-reference /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM rsem-synthesis-reference-transcripts /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM 0 0 /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta

Transcript Information File is generated!

Group File is generated!

Extracted Sequences File is generated!

rsem-preref /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM.transcripts.fa 1 /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM

Refs.makeRefs finished!

Refs.saveRefs finished!

/home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM.idx.fa is generated! /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM.n2g.idx.fa is generated!

$VAR1 = [ { ‘output_dir’ => ‘Sp_ds’, ‘right’ => ‘/home/bdash/Desktop/trinity2/Sp.ds.1M.right.fq’, ‘left’ => ‘/home/bdash/Desktop/trinity2/Sp.ds.1M.left.fq’ } ];

CMD: set -o pipefail && bowtie -q –all –best –strata -m 300 –chunkmbs 512 -X 800 -S -p 4 /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.bowtie -1 /home/bdash/Desktop/trinity2/Sp.ds.1M.left.fq -2 /home/bdash/Desktop/trinity2/Sp.ds.1M.right.fq | samtools view -F 4 -S -b -o bowtie.bam -

# reads processed: 1000000

# reads with at least one reported alignment: 904300 (90.43%)

# reads that failed to align: 95700 (9.57%)

Reported 1038742 paired-end alignments

CMD: touch bowtie.bam.ok

CMD: rsem-calculate-expression –paired-end -p 4 –no-bam-output –bam bowtie.bam /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM RSEM rsem-parse-alignments /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM RSEM.temp/RSEM RSEM.stat/RSEM bowtie.bam 3 -tag XM

Parsed 1000000 entries

Done!

rsem-build-read-index 32 1 0 RSEM.temp/RSEM_alignable_1.fq RSEM.temp/RSEM_alignable_2.fq

Build Index RSEM.temp/RSEM_alignable_1.fq is Done!

Build Index RSEM.temp/RSEM_alignable_2.fq is Done!

rsem-run-em /home/bdash/Desktop/trinity2/trinity_out_dir/Trinity.fasta.RSEM 3 RSEM RSEM.temp/RSEM RSEM.stat/RSEM -p 4

Refs.loadRefs finished!

Thread 0 : N = 226285, NHit = 259685 Thread 1 : N = 226380, NHit = 259685 Thread 2 : N = 225931, NHit = 259685 Thread 3 : N = 225704, NHit = 259687

EM_init finished!

estimateFromReads, N1 finished.

ROUND = 1, SUM = 904300, bChange = 141.218, totNum = 9249 ROUND = 2, SUM = 904300, bChange = 0.995926, totNum = 1827 ROUND = 3, SUM = 904300.000000001, bChange = 0.950174, totNum = 1740 ROUND = 4, SUM = 904300, bChange = 0.907734, totNum = 1601

…………………………. ………………………… …………………………

ROUND = 988, SUM = 904300, bChange = 0.00389822, totNum = 1

ROUND = 989, SUM = 904300, bChange = 0.0038979, totNum = 1

ROUND = 990, SUM = 904300, bChange = 0.000977773, totNum = 0

Expression Results are written!

Time Used for EM.cpp : 0 h 01 m 20 s

rm -rf RSEM.temp

CMD: touch RSEM.isoforms.results.ok

Run the other pairs as above

Addional files in the directory/subdirectory follwing the above run

trinity2$ ls -l trinity_out_dir/

-rw-rw-r–. 1 bdash bdash 7499008 Mar 24 17:50 Trinity.fasta.bowtie.1.ebwt

-rw-rw-r–. 1 bdash bdash 1173876 Mar 24 17:50 Trinity.fasta.bowtie.2.ebwt

-rw-rw-r–. 1 bdash bdash 83249 Mar 24 17:50 Trinity.fasta.bowtie.3.ebwt

-rw-rw-r–. 1 bdash bdash 2347744 Mar 24 17:50 Trinity.fasta.bowtie.4.ebwt

-rw-rw-r–. 1 bdash bdash 0 Mar 24 17:50 Trinity.fasta.bowtie.ok

-rw-rw-r–. 1 bdash bdash 7499008 Mar 24 17:50 Trinity.fasta.bowtie.rev.1.ebwt

-rw-rw-r–. 1 bdash bdash 1173876 Mar 24 17:50 Trinity.fasta.bowtie.rev.2.ebwt

-rw-rw-r–. 1 bdash bdash 45143 Mar 24 17:50 Trinity.fasta.RSEM.grp

-rw-rw-r–. 1 bdash bdash 9630408 Mar 24 17:50 Trinity.fasta.RSEM.idx.fa

-rw-rw-r–. 1 bdash bdash 9630408 Mar 24 17:50 Trinity.fasta.RSEM.n2g.idx.fa

-rw-rw-r–. 1 bdash bdash 0 Mar 24 17:50 Trinity.fasta.RSEM.rsem.prepped.ok

-rw-rw-r–. 1 bdash bdash 10298243 Mar 24 17:50 Trinity.fasta.RSEM.seq

-rw-rw-r–. 1 bdash bdash 808833 Mar 24 17:50 Trinity.fasta.RSEM.ti

-rw-rw-r–. 1 bdash bdash 9630408 Mar 24 17:50 Trinity.fasta.RSEM.transcripts.fa

ls -l trinity2

drwxrwxr-x. 3 bdash bdash 149 Mar 24 17:58 Sp_ds

drwxrwxr-x. 3 bdash bdash 149 Mar 24 18:11 Sp_hs

drwxrwxr-x. 3 bdash bdash 149 Mar 24 18:28 Sp_log

drwxrwxr-x. 3 bdash bdash 149 Mar 24 18:20 Sp_plat

trinity2]$ ls -l Sp_ds/

-rw-rw-r–. 1 bdash bdash 737366 Mar 24 17:58 RSEM.genes.results

-rw-rw-r–. 1 bdash bdash 771154 Mar 24 17:58 RSEM.isoforms.results

trinity2]$ ls -l Sp_ds/RSEM.stat/

-rw-rw-r–. 1 bdash bdash 135 Mar 24 17:56 RSEM.cnt

-rw-rw-r–. 1 bdash bdash 80229 Mar 24 17:58 RSEM.model

-rw-rw-r–. 1 bdash bdash 325878 Mar 24 17:58 RSEM.theta

Runs for Sp_hs, Sp_plat and Sp_log data will lead to creation of similar directories and contents

Some file readouts

trinity2$ head Sp_ds/RSEM.isoforms.results

transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct

TRINITY_DN0_c0_g1_i1: TRINITY_DN0_c0_g1_i1 1221 957.41 27.00 29.20 31.19 100.00

TRINITY_DN1000_c0_g1_i1 TRINITY_DN1000_c0_g1_i1 571 307.41 8.00 26.94 28.78 100.00

TRINITY_DN1000_c0_g2_i1 TRINITY_DN1000_c0_g2_i1 3590 3326.41 67.00 20.85 22.27 100.00

TRINITY_DN1002_c0_g1_i1 TRINITY_DN1002_c0_g1_i1 333 73.99 1.00 13.99 14.95 100.00

TRINITY_DN1003_c0_g1_i1 TRINITY_DN1003_c0_g1_i1 1868 1604.41 257.00 165.84 177.14 100.00

TRINITY_DN1004_c0_g1_i1 TRINITY_DN1004_c0_g1_i1 3669 3405.41 76.00 23.11 24.68 100.00

TRINITY_DN1005_c0_g1_i1 TRINITY_DN1005_c0_g1_i1 922 658.41 6.00 9.43 10.08 100.00

TRINITY_DN1005_c0_g2_i1 TRINITY_DN1005_c0_g2_i1 424 160.43 0.00 0.00 0.00 0.00

TRINITY_DN1006_c0_g1_i1 TRINITY_DN1006_c0_g1_i1 1295 1031.41 90.00 90.34 96.49 100.00

trinity2]$ head Sp_ds/RSEM.genes.results

*gene_id transcript_id(s) length effective_length expected_count TPM FPKM

TRINITY_DN0_c0_g1_i1 TRINITY_DN0_c0_g1_i1 1221.00 957.41 27.00 29.20 31.19

TRINITY_DN1000_c0_g1_i1 TRINITY_DN1000_c0_g1_i1 571.00 307.41 8.00 26.94 28.78

TRINITY_DN1000_c0_g2_i1 TRINITY_DN1000_c0_g2_i1 3590.00 3326.41 67.00 20.85 22.27

TRINITY_DN1002_c0_g1_i1 TRINITY_DN1002_c0_g1_i1 333.00 73.99 1.00 13.99 14.95

TRINITY_DN1003_c0_g1_i1 TRINITY_DN1003_c0_g1_i1 1868.00 1604.41 257.00 165.84 177.14

TRINITY_DN1004_c0_g1_i1 TRINITY_DN1004_c0_g1_i1 3669.00 3405.41 76.00 23.11 24.68

TRINITY_DN1005_c0_g1_i1 TRINITY_DN1005_c0_g1_i1 922.00 658.41 6.00 9.43 10.08

TRINITY_DN1005_c0_g2_i1 TRINITY_DN1005_c0_g2_i1 424.00 160.43 0.00 0.00 0.00

TRINITY_DN1006_c0_g1_i1 TRINITY_DN1006_c0_g1_i1 1295.00 1031.41 90.00 90.34 96.49

