Steven asked me today to run the C. virginica BS-seq data we had through the bismark pipeline, so this notebook will serve as the first step in that process.

First, we need some data files.

setwd("~/Documents/owl/nightingales/C_virginica")
list.files(pattern = "*.gz")
 [1] "2112_lane1_ACAGTG_L001_R1_001.fastq.gz"  "2112_lane1_ACAGTG_L001_R1_002.fastq.gz"  "2112_lane1_ATCACG_L001_R1_001.fastq.gz" 
 [4] "2112_lane1_ATCACG_L001_R1_002.fastq.gz"  "2112_lane1_ATCACG_L001_R1_003.fastq.gz"  "2112_lane1_CAGATC_L001_R1_001.fastq.gz" 
 [7] "2112_lane1_CAGATC_L001_R1_002.fastq.gz"  "2112_lane1_CAGATC_L001_R1_003.fastq.gz"  "2112_lane1_GCCAAT_L001_R1_001.fastq.gz" 
[10] "2112_lane1_GCCAAT_L001_R1_002.fastq.gz"  "2112_lane1_NoIndex_L001_R1_001.fastq.gz" "2112_lane1_NoIndex_L001_R1_002.fastq.gz"
[13] "2112_lane1_NoIndex_L001_R1_003.fastq.gz" "2112_lane1_NoIndex_L001_R1_004.fastq.gz" "2112_lane1_NoIndex_L001_R1_005.fastq.gz"
[16] "2112_lane1_NoIndex_L001_R1_006.fastq.gz" "2112_lane1_NoIndex_L001_R1_007.fastq.gz" "2112_lane1_NoIndex_L001_R1_008.fastq.gz"
[19] "2112_lane1_NoIndex_L001_R1_009.fastq.gz" "2112_lane1_NoIndex_L001_R1_010.fastq.gz" "2112_lane1_NoIndex_L001_R1_011.fastq.gz"
[22] "2112_lane1_NoIndex_L001_R1_012.fastq.gz" "2112_lane1_NoIndex_L001_R1_013.fastq.gz" "2112_lane1_NoIndex_L001_R1_014.fastq.gz"
[25] "2112_lane1_NoIndex_L001_R1_015.fastq.gz" "2112_lane1_TGACCA_L001_R1_001.fastq.gz"  "2112_lane1_TTAGGC_L001_R1_001.fastq.gz" 
[28] "2112_lane1_TTAGGC_L001_R1_002.fastq.gz" 

Those are some files! A couple weeks ago it came up that we have both demultiplexed and non-demultiplexed data in the same directory, and the NoIndex files are the non-demultiplexed data that we don’t want.

files.vec <- c("2112_lane1_ACAGTG_L001_R1_001.fastq.gz", "2112_lane1_ACAGTG_L001_R1_002.fastq.gz", "2112_lane1_ATCACG_L001_R1_001.fastq.gz", "2112_lane1_ATCACG_L001_R1_002.fastq.gz", "2112_lane1_ATCACG_L001_R1_003.fastq.gz", "2112_lane1_CAGATC_L001_R1_001.fastq.gz", "2112_lane1_CAGATC_L001_R1_002.fastq.gz", "2112_lane1_CAGATC_L001_R1_003.fastq.gz", "2112_lane1_GCCAAT_L001_R1_001.fastq.gz", "2112_lane1_GCCAAT_L001_R1_002.fastq.gz", "2112_lane1_TGACCA_L001_R1_001.fastq.gz", "2112_lane1_TTAGGC_L001_R1_001.fastq.gz", "2112_lane1_TTAGGC_L001_R1_002.fastq.gz")

13 files corresponds to the number in our Nightingales spreadsheet. These are 100 base pair single ended reads of animals that have either been exposed to 25,000 ppm oil, or no oil exposure.

Now we need to copy them over to Emu.

First, make a directory.


system("mkdir ~/Documents/C-virginica-BSSeq")

setwd("~/Documents/C-virginica-BSSeq")

And copy the files over.

for(i in 1:length(files.vec))   {
  
  system(paste0("scp ~/Documents/owl/nightingales/C_virginica/", files.vec[i], " ~/Documents/C-virginica-BSSeq/", files.vec[i]))
  
  
}

Next, we’ll run MD5 sum for each of the files to make sure that they match what’s on Nightingales.

system("md5sum *.gz > md5sums.txt")
length(which(emu.md5sums$X1 %in% night.checksums$X4))
[1] 13

Maybe it’s a little bit of cheating. But I just made sure that all 13 MD5 checksums generated by the Emu files are found in the greater nightingales MD5 checksums file.

From here, I’m pretty sure I need to concatenate the files to make the actual 6 sample files (Pretty sure it’s 6, that’s what nightingales says!)

First, we unzip them.

setwd("~/Documents/C-virginica-BSSeq")
The working directory was changed to /home/srlab/Documents/C-virginica-BSSeq inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
files.to.unzip <- list.files(pattern = "*.gz")
for(i in 1:length(files.to.unzip))  {
system(paste0("gunzip ", files.to.unzip[i]))
}  

Then we concatenate the different files in to 6 sample files. Couldn’t think of a good algorithmic way to do this, so I went with creating the cat commands by hand.

setwd("~/Documents/C-virginica-BSSeq")
The working directory was changed to /home/srlab/Documents/C-virginica-BSSeq inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
files.to.cat <- list.files(pattern = "*.fastq")
system(paste("cat", files.to.cat[1], files.to.cat[2], "> 2112_lane1_ACAGTG_L001_R1.fastq"))
system(paste("cat", files.to.cat[3], files.to.cat[4], files.to.cat[5], "> 2112_lane1_ATCACG_R1.fastq"))
system(paste("cat", files.to.cat[6], files.to.cat[7], files.to.cat[8], "> 2112_lane1_CAGATC_R1.fastq"))
system(paste("cat", files.to.cat[9], files.to.cat[10], "> 2112_lane1_GCCAAT_L001_R1.fastq"))
system(paste("cat", files.to.cat[12], files.to.cat[13], "> 2112_lane1_TTAGGC_L001_R1.fastq"))

lets check out the concatenated files, just to make sure they look ok.

setwd("~/Documents/C-virginica-BSSeq")
The working directory was changed to /home/srlab/Documents/C-virginica-BSSeq inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
system("head -10 2112_lane1_ACAGTG_L001_R1.fastq")
@HWI-ST0747:461:C64YUACXX:1:1101:1217:2102 1:N:0:ACAGTG
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAACAAAATATTGACAGTTCATTAGAAGTTTACAACA
+
@@;DDD?DD;<CFHIIIGGGHEHGIIGGDD?FHD@F;?FGDDGCBFCDAFDB';=@GG@DAEH(.=?).;@A#############################
@HWI-ST0747:461:C64YUACXX:1:1101:1243:2121 1:N:0:ACAGTG
CTCACTAAACATCACGACTCCATCCGATAATAAATCCTAATTATACGCTACATCTACATCCTCCTTCCTTAATATACTCCGTTTACTAAATCATCCCATCA
+
CCCFFFFFHHHHHJJJAFHGIJJJJJGGIIGGIJJJIJJIJJIJIIJIIIJGHGHFHIJIJJJHFHHHHFFFFFFFEEEEDACDDDDECCCDDDDDDDDD:
@HWI-ST0747:461:C64YUACXX:1:1101:1145:2163 1:N:0:ACAGTG
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAATCACAAAACACAAAAACAAAATTCTAAT
system("tail -10 2112_lane1_ACAGTG_L001_R1.fastq")
+
=:?DDF?DDCFHHIJJGFHIEHHIHJJGHIIIJ>DHIGHFIJJJGFGHHHHABGHFIIIJJJGA=E>?B################################
@HWI-ST0747:461:C64YUACXX:1:2316:20101:100871 1:N:0:ACAGTG
CACCGACGTCACCACGATAAAATTAAACAATCTAAGAACAAAAACGACCGCCACGATAACCACCAAACACCCGAACTACGCAATCCTCGCCGCAAGAATCG
+
1114:8ADDD=<+<8?:??DEEEIDA>3BDEDBDE4?DDEDIBAA;AAA89=@<'9?8?(>>85==????>??>>>?AA?>???A?AA>>?>>995??AA>
@HWI-ST0747:461:C64YUACXX:1:2316:20494:100932 1:N:0:ACAGTG
CTCCTCCAACTAACCGATAACGTCGAAACGANCNNCCTCGCGAAAACCAACTAATCAAAAAAATCCCGATATCCGNAAACGCTCAACCAATACAATAACCT
+
?+:BDDDDAHDHHIID<GHG>):CDHHI<FH#0##(-7C1AEFD9ACC?BB>CCCCCCCCBBBBCCC7>BBBCEB#+2<@B@BBBCCBBBCCCCCCCCCCC
system("head -10 2112_lane1_ATCACG_R1.fastq")
@HWI-ST0747:461:C64YUACXX:1:1101:1222:2147 1:N:0:ATCACG
CTATACAAACACTAAACTAACTTCTCCCACAAACCTATATACTACCGAATACTATTATCCAAACAATAAAACTATCGACTCCCCTACGCCCTCAAAATTTC
+
@@<D?DB;BFDFB@;F@CEDGIIGEHHIGGG=8CFIII>FHII>GIE06B?FD=BG:BHGC7=@GA>7@>7AHHC3?235<@?BCBB=BBBBB<AA::@CA
@HWI-ST0747:461:C64YUACXX:1:1101:1486:2076 1:N:0:ATCACG
GANCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAGGAATAGGGAAAGGGAAAAGGGTCGCGAA
+
@@#4ADDDHHHHGJGIJIIIJIIJIIIJJGHFHIJJIIJHIIFIJGHIIIJJJIGGGIJIGGH=>E@##################################
@HWI-ST0747:461:C64YUACXX:1:1101:1442:2090 1:N:0:ATCACG
CATCCCCAATACGAAAACATAATCGACGTAAACGCACCCGCTACCGATTATTAAAATACATCATCAATAAAATATAATCATAAATCAAATAACAAAAACAT
system("tail -10 2112_lane1_ATCACG_R1.fastq")
+
=:=D=<DDDF?F,A?GFIE8B@GIFAF?1:CD@*?D?FBD??GCFFB?888;'565;CCC=(=?<()6>>;;92>3,(,8?BB719@##############
@HWI-ST0747:461:C64YUACXX:1:2316:20682:100827 1:N:0:ATCACG
CTACCACGATCATTACAAAAATCAAAAAAATATCGCAATCATTATTCTACGATATTATACTTTATATAAAATAATTAATTATGATTAACTAAAAAACATTT
+
@<<D+==:<DDFDHIIIG::<<9CHIDGII9D?DF=66@B<CHIIIIHIIFGG=:CHCHHECDEEEEEEAC@CCDCDCEEEECCEEECCCCCCCCBBBCCE
@HWI-ST0747:461:C64YUACXX:1:2316:20681:100878 1:N:0:ATCACG
TTAATCCCTTCAATTAATCAGAGACCAGAGAACTTGTGGAGGGAAAATCCACACACCAATGAGAAAGAAAAAGACAAGATAAAACACCAATGGAATTAAAA
+
?<+=BBADFHHGGJJIGIJIIIBCFFHCGGIEFIIIGIGGBFHDHIGIIIJ@GIIJJJFII<@DHFHHHFFFCDECEBDDDDCDDDBD9?ACDDAAC@ACD
system("head -10 2112_lane1_CAGATC_R1.fastq")
@HWI-ST0747:461:C64YUACXX:1:1101:1170:2189 1:N:0:CAGATC
CTCCGCAAATTATAAAAAAATTTATCCCTTACGTAAAAAAACCAAAAAAATAAACAAAAATCGCGAACTCCCAAAAAAACTTACTATACCACTTCCTCCTC
+
@?@DDDDFG>FDHIGIGHHGGEHGGII@GHIIHIGGGEIGGIIHHIHGFC>CCDDDBB?BBCBDD;@BBD:A9A828>BBCDDDDEEDCCD@CDCCCCCCC
@HWI-ST0747:461:C64YUACXX:1:1101:1357:2076 1:N:0:CAGATC
CCNCGTAGACGTCATCCTGAAATACCCCATAGAGGTGTGAATCTCCTCAGGCCCCGAGTTCTTCCCTCCGGAGAAGATAGAAACCAAACTAAAAAACAGCA
+
@@#4=ABDHFFFHIIIIIG@ECHIIGIIIIIIGGGDFBFGHCHGGGGIIIHIIIIIAB5;CEHHCEEFFDCD@;255:ACC@CCDB<BDDDDCCCBB(9?9
@HWI-ST0747:461:C64YUACXX:1:1101:1311:2076 1:N:0:CAGATC
CANATACCTTAACAAAACAATAACACACAAACTATCACCGTACACACACCTTAACAAAACAATAACACATAAACTATCACAATACACACACCTAAACAAAA
system("tail -10 2112_lane1_CAGATC_R1.fastq")
+
=?=BBDDFGGDDFGGE@GCFDHIICFFGB@<:?1@ADBFHGEGIGG3BBFHE;F>CEAEECCHFFEFDC8;??BDA@CDCDCDDEE@CCDCCCCCCCCCDC
@HWI-ST0747:461:C64YUACXX:1:2316:21166:100797 1:N:0:CAGATC
CATCTCAACAAAACAATATAATTAACTCAAAAAATAATTAAAATAAAATACTTAATACACAAACAACAACATATAAATACTTAAAAAAAACTATAATTAAC
+
+1::?+A;D<D?DA3C<<A???C:F############################################################################
@HWI-ST0747:461:C64YUACXX:1:2316:21222:100911 1:N:0:CAGATC
AATAAATCATAAACAAACGGGCAACATACCTAATCACAATAAATATAACACACCACAATACCCAAAATCCCTAATCACAATAAATCATAAACAAACGGGCA
+
;;:BD;?D?F=CDGII>GAEGAHGIIII*?FHIEHGB?DDGGIIAHGG<F?8-7=CGG:GEH)@@CAEHEE>DDECCC>CCCCCE@@CCCCCCBBBB?<B@
system("head -10 2112_lane1_GCCAAT_L001_R1.fastq")
@HWI-ST0747:461:C64YUACXX:1:1101:1199:2076 1:N:0:GCCAAT
GTNCATCCTCAGAGTGGTTCTCTTGGTACGATCCCTTTGCACCATCATGATACTGCTGCACATTCAGGTTGGAGGAGGCACTTCGGAAAAGAATAGATCGG
+
CC#4ADDFHHGHHHIFHHIJJJJJJJHGJJJJJJJIJJJJJJIIHIIJJIJGIJIJJIIIIFIJJIIJ=DGF?CHEBFCDBEDDDD;<?CA?ACCDDDDDD
@HWI-ST0747:461:C64YUACXX:1:1101:1123:2118 1:N:0:GCCAAT
TCACCTACCCCGACCATTCACACCACACGTACCCATAACCATCGCCCCGAATTCCTCGCAACCACAAAACTCGAACTTTAACATATCCCAACTAACCGCTA
+
@@@?BDDDF?FFDF@@GH>HIGGIIIGHF?DD=D9BGGIFB?C1BFHIFA';BBDCCC>9;BB@B<??BBACB(98?CCCCCCCCDCDC>CBC@CCB<>B9
@HWI-ST0747:461:C64YUACXX:1:1101:1195:2123 1:N:0:GCCAAT
TGGCTGTAATAAATTGGATCGAGCAAAGAGTTACTGTGGAAATGAGAAACACAAAGCCTGGTGGTAACCAAACACCGCCAGTTCAGTGTGAGAGTGTGAAA
system("tail -10 2112_lane1_GCCAAT_L001_R1.fastq")
+
:1144ADBDFD?<AF=EHHGBG<=G@ECFHHIIIAG<DAEGFGDHGGIG>CEEEFFEEDCCCCCEEDC@BCBBCC<?A?C@CCCCBB>BBCCECCCBBCCB
@HWI-ST0747:461:C64YUACXX:1:2316:20993:100848 1:N:0:GCCAAT
CTGCAACATATATTATACAGATTAAAAAAGAATAATTAAGTACACATTTCTCCTTATTCATATAATCTTGAAATCTAACAATTGATTATACATTGATTATG
+
@@@DDDDDHFD>DHHGGIIFG9F<FHIFHEFGGIHA?GG?BFEGAB*?FH*BBC@FGFBBFCHHC<==))=8@GE=DH@=EEHE>B@;@3?>CD@AD@CAC
@HWI-ST0747:461:C64YUACXX:1:2316:20831:100902 1:N:0:GCCAAT
AGTAAATCATAAACAAACGGGCAACATACCTTAAACTTCCCAAAAAATCAAATCTAAATTAAAACATACTTTCAATACCACTTATACTTATCATAAAAAAC
+
:11=+2AA2:=CDEIE:ACAE:EFEE)C9):DD4*0?D4B?DDDDDDDCEC;C@CCC>>@@==A)?A??AA@DA@>AA6;@AAAAA>ABDAAEDBAAA>>?
system("head -10 2112_lane1_TTAGGC_L001_R1.fastq")
@HWI-ST0747:461:C64YUACXX:1:1101:1192:2181 1:N:0:TTAGGC
CACACTAAAATTACTCTAACAAAAAACATCCGTTTTATAATAATCATCTCCTCAAACCCATAACACTCACACCAATTACAAAACGTTTAACAATAAAACTA
+
??@+=BDDF?FAAE<E,CHHFC@GEEGFIE>A:D*?D@FE@?909DBB*??<BC)=F@@FICCEEE;C=EED;BB6;ABCCABBB(999:@A:@A@55:@#
@HWI-ST0747:461:C64YUACXX:1:1101:1123:2211 1:N:0:TTAGGC
ACTACCACAAACCAATTATCCCTATAATAACTTTTCTAACACCTCTTACTTAAAACTCCTAAAATCAAAAAAATCAATAAACCCCACTTTCACAATCTATA
+
?@BDFFFFFHGHHJAHIFHJIJIHHHIIJIJJJJJJGGIIEIIIJJJIJIJCGGBDDHIJGGHIEHIDFHIE>@CCACCCCCCDDDDDDDDDDCCCDDCD:
@HWI-ST0747:461:C64YUACXX:1:1101:1158:2233 1:N:0:TTAGGC
AACGGGCAATACCTCAAAAAATAAAACGTCAACCTCAACCTAAAAAAACCCAACCTATTACCCAATATCAATCTCAACCTAAAAAAACCCAACCCTATCAC
system("tail -10 2112_lane1_TTAGGC_L001_R1.fastq")
+
1++4=BDDFHHH<FGGIGGDGHIIIIIIIHHIIIIIGIIIIII@EE#######################################################
@HWI-ST0747:461:C64YUACXX:1:2316:20326:100909 1:N:0:TTAGGC
CATTACCAACAAACCCCTCCTCAAACTCATACAACTTCCCATCAACATACAATTTATCCCTAACCAACTTTATCCTTTAACCTACTTATCTACATCACCTC
+
@@+=BDBEABFBFEAB@GBCFHHHIHGIFIGGIIIIIDHGG@GIEIGG<DHHH@=GGHIAHAAGE==CEEEHCEE>@DF;;CEECE;@CCCD>CCCCCACB
@HWI-ST0747:461:C64YUACXX:1:2316:20770:100804 1:N:0:TTAGGC
AATAAATCATAAACAAACGGGCAACATACCTCATAACTACACAACCTAACACAACACCATAATTACACAACCTTACACTACAACATAACTACAAAACCTAA
+
?@@DDDDDFFDD8<BBHEIIB11?C3??EGFGIDCGGHGGGHIHFHIIIG@FCHGIG@DHIHBA>=AADEB?CCCCCCCCCCCCBCCCCCC@C(:AABACC
system("head -10 2112_lane1_TGACCA_L001_R1_001.fastq")
@HWI-ST0747:461:C64YUACXX:1:1101:1110:2171 1:N:0:TGACCA
CGCGCGTTTGTTATTGTCTGATGTACAGATATCCCCATTGATGTTTTCTGTGATGTTGTGTGACCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTG
+
CCCFFFDDHHHHFHIIIHIJEHIIDIGHGIII?CBDDGHIGGGIJIJFHIIIFHFBFHDHHIGEDHCD:AHF;?D@A@>=CBBABD@:>::(:(5:C@CCC
@HWI-ST0747:461:C64YUACXX:1:1101:1225:2225 1:N:0:TGACCA
CCTCGCTGAAAACTGCATGTTTTCGCATTTTGTACAAATCAAAGTTCTCAAGTACAGATTTTAGATCGGTAGAGCACACGTCTGAACTCCAGTCACTGACC
+
???DD??DD<3C32<:+AC?4C+<1?C18??*1??4:DD>*B*90?D<98*99?C:BC<C@C87ACE2-(64=7..6@;36=;=(>>(-5:>AA::A:555
@HWI-ST0747:461:C64YUACXX:1:1101:1348:2168 1:N:0:TGACCA
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAATATATAAAAAACTTAATAACGTAACAGA
system("tail -10 2112_lane1_TGACCA_L001_R1_001.fastq")
+
1:14=BDDFHHHHJGIJHJJJHIFHIJDHHIJJ9C3:??DFHIJJIIJJJJJIJJJJJJJJJIHGHEEDA###############################
@HWI-ST0747:461:C64YUACXX:1:2316:19569:100878 1:N:0:TGACCA
TACCTGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACACAACCTCCCCACATTCAAACAATC
+
=8:AA?DDAFDF<FF+@EEGFHIIIIEFIFCFIIIIGFFDFGFF;BFFFAD<FFCFIGFIFFCED=7?BCB##############################
@HWI-ST0747:461:C64YUACXX:1:2316:20085:100797 1:N:0:TGACCA
CTGATCTCATTGTTCTGTTGTCGAGGTCTGTTCTGATCTCATTGTTCCGTTGTCTAGGTCTGTTCTGATCTCATCGTTAGATCGGAAGAGCACACGTCTGA
+
+8=D:DDDHHHHFHIIIGIIIIEEEHFFHGGGIGIIIIGHGIIIHIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIGHHHHEEEEECBCBBCCBCBBBBCB@

That all looks like fastq data, so we’ll assume its ok!

Previously I downloaded the new reference genome from NCBI located here https://www.ncbi.nlm.nih.gov/nuccore/MWPT00000000.1/ and stored it on Owl, so I should go fish out the reference .fasta file, so I have something local to work with.

system("mkdir /home/srlab/Documents/C-virginica-BSSeq/genome/")
system("scp /home/srlab/Documents/owl/scaphapoda/Sean/VirginicaGenome/C_virginica_1.0_genome.fasta ~/Documents/C-virginica-BSSeq/genome/")
system("head -10 ~/Documents/C-virginica-BSSeq/genome/C_virginica_1.0_genome.fasta")
>KV918244.1 Crassostrea virginica isolate RU13XGHG1-28 chromosome 1 unlocalized genomic scaffold LG1_scaffold10_1, whole genome shotgun sequence
TAAATTGACAGGTCCTCCAGGATGTCAACACGAGAAAAAAGTTGTGGGTCTGCCTTGCGGCTGTTAGTAGACCTGTCAAA
ATACTGGTCTATGGCGATATAGCTGCCGAGCGGCTTCACGATTTCGTACGGAGGTCTTGTGGCAACCTTACGAAGCCGAG
ccattttctattcaaaataatgttcatCCGACAGCCATACAGCAGCCCCAATAGTGATGTGAAATAGGCTTAAAGGATTT
TCTGACGACGTGCAATCGAAGCGAGCTGATTCCAGACGAAATCGCAAACATAATAGAAGGCCCAAAACAGCTGTGGAATG
ATAGACATAGTACGTTAACCGTACACGCCGCGTGTTGTCCGTCcgaaatatttttctaaaaagacGCTTGCTTGAAACCT
CTGCGACAATTTTACGAATTGCGGTTGAAAGCCATTTCACAAGGTTGCCATAGAAAAACCATATGCTGGTTGTACATTTA
TCCTTTCCACAACGAAATATAAATGACTGTGTAAATCCCAGGAGATGACTATTTAGTAATCTTTCACCATTTTCCTCAGC
CAACCCTACCATAATCAGCAAGTGCATACATAAAAGGAAGGGGGTTGCTTTTTGAAAataggaaattcataattttagat
ttagtttactctacatgtatatattttaacttgGAGATAAATCcgtataaaattgaaattttgaaaattcctactgcaaa

That looks good. So… Bismark? Hopefully the genome prep step isn’t too computer intensive, currently on Emu I’m running a 12 core Blast, a single core Dammit annotation run, and downloading/gzipping a bunch of SRA data sets. Not much power to spare on the poor old bird.

setwd("~/Documents/C-virginica-BSSeq/")
The working directory was changed to /home/srlab/Documents/C-virginica-BSSeq inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
system("/home/shared/Bismark/bismark_genome_preparation ~/Documents/C-virginica-BSSeq/genome/")
Writing bisulfite genomes out into a single MFA (multi FastA) file

Bisulfite Genome Indexer version v0.16.3 (last modified 25 August 2015)


Step I - Prepare genome folders - completed

Total number of conversions performed:
C->T:   119233917
G->A:   119258524
Please be aware that this process can - depending on genome size - take several hours!

Step II - Genome bisulfite conversions - completed


Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer
Settings:
  Output files: "BS_CT.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  genome_mfa.CT_conversion.fa
Reading reference sizes
Building a SMALL index
Settings:
  Output files: "BS_GA.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  genome_mfa.GA_conversion.fa
Reading reference sizes
Building a SMALL index
  Time reading reference sizes: 00:00:13
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time reading reference sizes: 00:00:18
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:14
bmax according to bmaxDivN setting: 171165873
Using parameters --bmax 128374405 --dcv 1024
  Doing ahead-of-time memory usage test
  Time to join reference sequences: 00:00:15
bmax according to bmaxDivN setting: 171165873
Using parameters --bmax 128374405 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 128374405 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  Passed!  Constructing with these parameters: --bmax 128374405 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:40
  Allocating rank array
  Ranking v-sort output
  V-Sorting samples time: 00:00:39
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:09
  Invoking Larsson-Sadakane on ranks
  Ranking v-sort output time: 00:00:09
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:16
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  Invoking Larsson-Sadakane on ranks time: 00:00:15
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  10%
  10%
  20%
  20%
  30%
  30%
  40%
  40%
  50%
  50%
  60%
  60%
  70%
  70%
  80%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:01:08
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 9.78091e+07 (target: 128374404)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  90%
  10%
  20%
  30%
  40%
  100%
  Binary sorting into buckets time: 00:01:13
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
  Binary sorting into buckets
  50%
  60%
  70%
  10%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:16
  Sorting block of length 87601004
  (Using difference cover)
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:00:55
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 1; iterating...
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:00:56
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
  Binary sorting into buckets
  10%
  Sorting block time: 00:01:54
Returning block of 87601005
  20%
  30%
  40%
  50%
Getting block 2 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  60%
  40%
  50%
  60%
  70%
  70%
  80%
  90%
  80%
  100%
  Block accumulator loop time: 00:00:18
  Sorting block of length 53493988
  (Using difference cover)
  90%
  100%
  Binary sorting into buckets time: 00:00:58
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 8.55829e+07 (target: 128374404)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:17
  Sorting block of length 37711281
  (Using difference cover)
  Sorting block time: 00:01:08
Returning block of 53493989
  Sorting block time: 00:00:43
Returning block of 37711282
Getting block 2 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
Getting block 3 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  10%
  20%
  20%
  30%
  30%
  40%
  40%
  50%
  50%
  60%
  60%
  70%
  70%
  80%
  80%
  90%
  90%
  100%
  Block accumulator loop time: 00:00:22
  Sorting block of length 117000179
  (Using difference cover)
  100%
  Block accumulator loop time: 00:00:21
  Sorting block of length 118436539
  (Using difference cover)
  Sorting block time: 00:02:23
Returning block of 117000180
  Sorting block time: 00:02:38
Returning block of 118436540
Getting block 3 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
Getting block 4 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  70%
  10%
  80%
  20%
  90%
  30%
  100%
  Block accumulator loop time: 00:00:22
  Sorting block of length 120965860
  (Using difference cover)
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:22
  Sorting block of length 103117892
  (Using difference cover)
  Sorting block time: 00:02:26
Returning block of 120965861
  Sorting block time: 00:02:15
Returning block of 103117893
Getting block 5 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
Getting block 4 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  10%
  20%
  20%
  30%
  30%
  40%
  40%
  50%
  50%
  60%
  60%
  70%
  70%
  80%
  80%
  90%
  90%
  100%
  Block accumulator loop time: 00:00:22
  Sorting block of length 91375156
  (Using difference cover)
  100%
  Block accumulator loop time: 00:00:24
  Sorting block of length 91937189
  (Using difference cover)
  Sorting block time: 00:01:48
Returning block of 91375157
  Sorting block time: 00:01:57
Returning block of 91937190
Getting block 5 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
Getting block 6 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  60%
  10%
  70%
  20%
  80%
  90%
  30%
  100%
  Block accumulator loop time: 00:00:20
  Sorting block of length 65683113
  (Using difference cover)
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:25
  Sorting block of length 109022689
  (Using difference cover)
  Sorting block time: 00:01:16
Returning block of 65683114
Getting block 6 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:23
  Sorting block of length 112552660
  (Using difference cover)
  Sorting block time: 00:02:22
Returning block of 109022690
Getting block 7 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:17
  Sorting block of length 121054188
  (Using difference cover)
  Sorting block time: 00:02:13
Returning block of 112552661
Getting block 7 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:22
  Sorting block of length 98566080
  (Using difference cover)
  Sorting block time: 00:02:42
Returning block of 121054189
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 223107998
fchr[G]: 223107998
fchr[T]: 342366522
fchr[$]: 684663495
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 232434413 bytes to primary EBWT file: BS_CT.1.bt2
Wrote 171165880 bytes to secondary EBWT file: BS_CT.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 684663495
    bwtLen: 684663496
    sz: 171165874
    bwtSz: 171165874
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 42791469
    offsSz: 171165876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3565956
    numLines: 3565956
    ebwtTotLen: 228221184
    ebwtTotSz: 228221184
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:22:28
Reading reference sizes
  Time reading reference sizes: 00:00:11
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Sorting block time: 00:01:57
Returning block of 98566081
  Time to join reference sequences: 00:00:13
  Time to reverse reference sequence: 00:00:02
bmax according to bmaxDivN setting: 171165873
Using parameters --bmax 128374405 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 128374405 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
Getting block 8 of 8
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:15
  Sorting block of length 40809159
  (Using difference cover)
  V-Sorting samples time: 00:00:39
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:10
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:16
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  10%
  Sorting block time: 00:00:47
Returning block of 40809160
  20%
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 342366522
fchr[G]: 461600439
fchr[T]: 461600439
fchr[$]: 684663495
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 232434413 bytes to primary EBWT file: BS_GA.1.bt2
Wrote 171165880 bytes to secondary EBWT file: BS_GA.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 684663495
    bwtLen: 684663496
    sz: 171165874
    bwtSz: 171165874
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 42791469
    offsSz: 171165876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3565956
    numLines: 3565956
    ebwtTotLen: 228221184
    ebwtTotSz: 228221184
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:24:23
Reading reference sizes
  30%
  40%
  50%
  Time reading reference sizes: 00:00:15
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  60%
  70%
  Time to join reference sequences: 00:00:14
  Time to reverse reference sequence: 00:00:02
bmax according to bmaxDivN setting: 171165873
Using parameters --bmax 128374405 --dcv 1024
  Doing ahead-of-time memory usage test
  80%
  Passed!  Constructing with these parameters: --bmax 128374405 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  90%
  100%
  Binary sorting into buckets time: 00:01:03
Splitting and merging
  Splitting and merging time: 00:00:00
Split 2, merged 8; iterating...
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  V-Sorting samples time: 00:00:37
  Allocating rank array
  Ranking v-sort output
  60%
  70%
  Ranking v-sort output time: 00:00:09
  Invoking Larsson-Sadakane on ranks
  80%
  90%
  Invoking Larsson-Sadakane on ranks time: 00:00:15
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  100%
  Binary sorting into buckets time: 00:00:54
Splitting and merging
  Splitting and merging time: 00:00:00
Split 2, merged 1; iterating...
  Binary sorting into buckets
  10%
  10%
  20%
  20%
  30%
  30%
  40%
  40%
  50%
  60%
  50%
  70%
  60%
  80%
  70%
  90%
  80%
  100%
  Binary sorting into buckets time: 00:00:56
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 9.78091e+07 (target: 128374404)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  90%
  40%
  50%
  60%
  100%
  Binary sorting into buckets time: 00:01:09
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
  Binary sorting into buckets
  70%
  80%
  90%
  10%
  100%
  Block accumulator loop time: 00:00:18
  Sorting block of length 110146235
  (Using difference cover)
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:00:57
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 9.78091e+07 (target: 128374404)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:18
  Sorting block of length 103611807
  (Using difference cover)
  Sorting block time: 00:02:24
Returning block of 110146236
Getting block 2 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:19
  Sorting block of length 112961761
  (Using difference cover)
  Sorting block time: 00:02:02
Returning block of 103611808
Getting block 2 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:22
  Sorting block of length 102281886
  (Using difference cover)
  Sorting block time: 00:02:26
Returning block of 112961762
  Sorting block time: 00:02:01
Returning block of 102281887
Getting block 3 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
Getting block 3 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  80%
  10%
  90%
  20%
  100%
  Block accumulator loop time: 00:00:25
  Sorting block of length 127701342
  (Using difference cover)
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:25
  Sorting block of length 127638444
  (Using difference cover)
  Sorting block time: 00:02:49
Returning block of 127701343
  Sorting block time: 00:02:35
Returning block of 127638445
Getting block 4 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
Getting block 4 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  30%
  10%
  40%
  20%
  50%
  30%
  60%
  40%
  70%
  50%
  80%
  60%
  90%
  70%
  100%
  Block accumulator loop time: 00:00:21
  Sorting block of length 56383955
  (Using difference cover)
  80%
  90%
  100%
  Block accumulator loop time: 00:00:24
  Sorting block of length 118984705
  (Using difference cover)
  Sorting block time: 00:01:12
Returning block of 56383956
Getting block 5 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:24
  Sorting block of length 72299174
  (Using difference cover)
  Sorting block time: 00:02:24
Returning block of 118984706
Getting block 5 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  Sorting block time: 00:01:31
Returning block of 72299175
  100%
  Block accumulator loop time: 00:00:24
  Sorting block of length 101221118
  (Using difference cover)
Getting block 6 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:25
  Sorting block of length 87190517
  (Using difference cover)
  Sorting block time: 00:01:59
Returning block of 101221119
Getting block 6 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  Sorting block time: 00:01:56
Returning block of 87190518
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:24
  Sorting block of length 120633961
  (Using difference cover)
Getting block 7 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:17
  Sorting block of length 117980505
  (Using difference cover)
  Sorting block time: 00:02:28
Returning block of 120633962
Getting block 7 of 7
  Reserving size (128374405) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  Sorting block time: 00:02:40
Returning block of 117980506
  80%
  90%
  100%
  Block accumulator loop time: 00:00:15
  Sorting block of length 10291568
  (Using difference cover)
  Sorting block time: 00:00:11
Returning block of 10291569
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 342366522
fchr[G]: 461600439
fchr[T]: 461600439
fchr[$]: 684663495
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 232434413 bytes to primary EBWT file: BS_GA.rev.1.bt2
Wrote 171165880 bytes to secondary EBWT file: BS_GA.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 684663495
    bwtLen: 684663496
    sz: 171165874
    bwtSz: 171165874
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 42791469
    offsSz: 171165876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3565956
    numLines: 3565956
    ebwtTotLen: 228221184
    ebwtTotSz: 228221184
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 00:22:19
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 223107998
fchr[G]: 223107998
fchr[T]: 342366522
fchr[$]: 684663495
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 232434413 bytes to primary EBWT file: BS_CT.rev.1.bt2
Wrote 171165880 bytes to secondary EBWT file: BS_CT.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 684663495
    bwtLen: 684663496
    sz: 171165874
    bwtSz: 171165874
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 42791469
    offsSz: 171165876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3565956
    numLines: 3565956
    ebwtTotLen: 228221184
    ebwtTotSz: 228221184
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 00:24:21

Well thats done! Going to switch to a new notebook for quality trimming, since it’s all kinds of spammy.

LS0tCnRpdGxlOiAiQmlzdWxmaXRlIHRyZWF0ZWQgQy4gdmlyZ2luaWNhIGFuYWx5c2lzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpTdGV2ZW4gYXNrZWQgbWUgdG9kYXkgdG8gcnVuIHRoZSBDLiB2aXJnaW5pY2EgQlMtc2VxIGRhdGEgd2UgaGFkIHRocm91Z2ggdGhlIGJpc21hcmsgcGlwZWxpbmUsIHNvIHRoaXMgbm90ZWJvb2sgd2lsbCBzZXJ2ZSBhcyB0aGUgZmlyc3Qgc3RlcCBpbiB0aGF0IHByb2Nlc3MuCgpGaXJzdCwgd2UgbmVlZCBzb21lIGRhdGEgZmlsZXMuCgpgYGB7cn0KCnNldHdkKCJ+L0RvY3VtZW50cy9vd2wvbmlnaHRpbmdhbGVzL0NfdmlyZ2luaWNhIikKCmxpc3QuZmlsZXMocGF0dGVybiA9ICIqLmd6IikKCmBgYAoKVGhvc2UgYXJlIHNvbWUgZmlsZXMhIEEgY291cGxlIHdlZWtzIGFnbyBpdCBjYW1lIHVwIHRoYXQgd2UgaGF2ZSBib3RoIGRlbXVsdGlwbGV4ZWQgYW5kIG5vbi1kZW11bHRpcGxleGVkIGRhdGEgaW4gdGhlIHNhbWUgZGlyZWN0b3J5LCBhbmQgdGhlIE5vSW5kZXggZmlsZXMgYXJlIHRoZSBub24tZGVtdWx0aXBsZXhlZCBkYXRhIHRoYXQgd2UgZG9uJ3Qgd2FudC4KCmBgYHtyfQoKZmlsZXMudmVjIDwtIGMoIjIxMTJfbGFuZTFfQUNBR1RHX0wwMDFfUjFfMDAxLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfQUNBR1RHX0wwMDFfUjFfMDAyLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfQVRDQUNHX0wwMDFfUjFfMDAxLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfQVRDQUNHX0wwMDFfUjFfMDAyLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfQVRDQUNHX0wwMDFfUjFfMDAzLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfQ0FHQVRDX0wwMDFfUjFfMDAxLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfQ0FHQVRDX0wwMDFfUjFfMDAyLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfQ0FHQVRDX0wwMDFfUjFfMDAzLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfR0NDQUFUX0wwMDFfUjFfMDAxLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfR0NDQUFUX0wwMDFfUjFfMDAyLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfVEdBQ0NBX0wwMDFfUjFfMDAxLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfVFRBR0dDX0wwMDFfUjFfMDAxLmZhc3RxLmd6IiwgIjIxMTJfbGFuZTFfVFRBR0dDX0wwMDFfUjFfMDAyLmZhc3RxLmd6IikKCmBgYAoKCjEzIGZpbGVzIGNvcnJlc3BvbmRzIHRvIHRoZSBudW1iZXIgaW4gb3VyIE5pZ2h0aW5nYWxlcyBzcHJlYWRzaGVldC4gVGhlc2UgYXJlIDEwMCBiYXNlIHBhaXIgc2luZ2xlIGVuZGVkIHJlYWRzIG9mIGFuaW1hbHMgdGhhdCBoYXZlIGVpdGhlciBiZWVuIGV4cG9zZWQgdG8gMjUsMDAwIHBwbSBvaWwsIG9yIG5vIG9pbCBleHBvc3VyZS4KCk5vdyB3ZSBuZWVkIHRvIGNvcHkgdGhlbSBvdmVyIHRvIEVtdS4KCkZpcnN0LCBtYWtlIGEgZGlyZWN0b3J5LgoKYGBge3J9CgpzeXN0ZW0oIm1rZGlyIH4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxIikKCnNldHdkKCJ+L0RvY3VtZW50cy9DLXZpcmdpbmljYS1CU1NlcSIpCgoKYGBgCgoKQW5kIGNvcHkgdGhlIGZpbGVzIG92ZXIuCgpgYGB7cn0KCmZvcihpIGluIDE6bGVuZ3RoKGZpbGVzLnZlYykpICAgewogIAogIHN5c3RlbShwYXN0ZTAoInNjcCB+L0RvY3VtZW50cy9vd2wvbmlnaHRpbmdhbGVzL0NfdmlyZ2luaWNhLyIsIGZpbGVzLnZlY1tpXSwgIiB+L0RvY3VtZW50cy9DLXZpcmdpbmljYS1CU1NlcS8iLCBmaWxlcy52ZWNbaV0pKQogIAogIAp9CgpgYGAKCk5leHQsIHdlJ2xsIHJ1biBNRDUgc3VtIGZvciBlYWNoIG9mIHRoZSBmaWxlcyB0byBtYWtlIHN1cmUgdGhhdCB0aGV5IG1hdGNoIHdoYXQncyBvbiBOaWdodGluZ2FsZXMuCgpgYGB7cn0KCnN5c3RlbSgibWQ1c3VtICouZ3ogPiBtZDVzdW1zLnR4dCIpCgpgYGAKCmBgYHtyfQoKbmlnaHQuY2hlY2tzdW1zIDwtIHJlYWRfZGVsaW0oIn4vRG9jdW1lbnRzL293bC9uaWdodGluZ2FsZXMvQ192aXJnaW5pY2EvY2hlY2tzdW1zLm1kNSIsIAogICAgIiAiLCBlc2NhcGVfZG91YmxlID0gRkFMU0UsIGNvbF9uYW1lcyA9IEZBTFNFLCB0cmltX3dzID0gVFJVRSkKCmVtdS5tZDVzdW1zIDwtIHJlYWRfZGVsaW0oIn4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxL21kNXN1bXMudHh0IiwgCiAgICAiICIsIGVzY2FwZV9kb3VibGUgPSBGQUxTRSwgY29sX25hbWVzID0gRkFMU0UsIAogICAgdHJpbV93cyA9IFRSVUUpCgoKbGVuZ3RoKHdoaWNoKGVtdS5tZDVzdW1zJFgxICVpbiUgbmlnaHQuY2hlY2tzdW1zJFg0ID09IFRSVUUpKQoKYGBgCgpNYXliZSBpdCdzIGEgbGl0dGxlIGJpdCBvZiBjaGVhdGluZy4gQnV0IEkganVzdCBtYWRlIHN1cmUgdGhhdCBhbGwgMTMgTUQ1IGNoZWNrc3VtcyBnZW5lcmF0ZWQgYnkgdGhlIEVtdSBmaWxlcyBhcmUgZm91bmQgaW4gdGhlIGdyZWF0ZXIgbmlnaHRpbmdhbGVzIE1ENSBjaGVja3N1bXMgZmlsZS4KCkZyb20gaGVyZSwgSSdtIHByZXR0eSBzdXJlIEkgbmVlZCB0byBjb25jYXRlbmF0ZSB0aGUgZmlsZXMgdG8gbWFrZSB0aGUgYWN0dWFsIDYgc2FtcGxlIGZpbGVzIChQcmV0dHkgc3VyZSBpdCdzIDYsIHRoYXQncyB3aGF0IG5pZ2h0aW5nYWxlcyBzYXlzISkKCkZpcnN0LCB3ZSB1bnppcCB0aGVtLgoKYGBge3J9CnNldHdkKCJ+L0RvY3VtZW50cy9DLXZpcmdpbmljYS1CU1NlcSIpCgpmaWxlcy50by51bnppcCA8LSBsaXN0LmZpbGVzKHBhdHRlcm4gPSAiKi5neiIpCgpmb3IoaSBpbiAxOmxlbmd0aChmaWxlcy50by51bnppcCkpICB7CgpzeXN0ZW0ocGFzdGUwKCJndW56aXAgIiwgZmlsZXMudG8udW56aXBbaV0pKQoKfSAgCmBgYAoKClRoZW4gd2UgY29uY2F0ZW5hdGUgdGhlIGRpZmZlcmVudCBmaWxlcyBpbiB0byA2IHNhbXBsZSBmaWxlcy4gQ291bGRuJ3QgdGhpbmsgb2YgYSBnb29kIGFsZ29yaXRobWljIHdheSB0byBkbyB0aGlzLCBzbyBJIHdlbnQgd2l0aCBjcmVhdGluZyB0aGUgY2F0IGNvbW1hbmRzIGJ5IGhhbmQuCgpgYGB7cn0KCnNldHdkKCJ+L0RvY3VtZW50cy9DLXZpcmdpbmljYS1CU1NlcSIpCgpmaWxlcy50by5jYXQgPC0gbGlzdC5maWxlcyhwYXR0ZXJuID0gIiouZmFzdHEiKQoKc3lzdGVtKHBhc3RlKCJjYXQiLCBmaWxlcy50by5jYXRbMV0sIGZpbGVzLnRvLmNhdFsyXSwgIj4gMjExMl9sYW5lMV9BQ0FHVEdfTDAwMV9SMS5mYXN0cSIpKQoKc3lzdGVtKHBhc3RlKCJjYXQiLCBmaWxlcy50by5jYXRbM10sIGZpbGVzLnRvLmNhdFs0XSwgZmlsZXMudG8uY2F0WzVdLCAiPiAyMTEyX2xhbmUxX0FUQ0FDR19SMS5mYXN0cSIpKQoKc3lzdGVtKHBhc3RlKCJjYXQiLCBmaWxlcy50by5jYXRbNl0sIGZpbGVzLnRvLmNhdFs3XSwgZmlsZXMudG8uY2F0WzhdLCAiPiAyMTEyX2xhbmUxX0NBR0FUQ19SMS5mYXN0cSIpKQoKc3lzdGVtKHBhc3RlKCJjYXQiLCBmaWxlcy50by5jYXRbOV0sIGZpbGVzLnRvLmNhdFsxMF0sICI+IDIxMTJfbGFuZTFfR0NDQUFUX0wwMDFfUjEuZmFzdHEiKSkKCnN5c3RlbShwYXN0ZSgiY2F0IiwgZmlsZXMudG8uY2F0WzEyXSwgZmlsZXMudG8uY2F0WzEzXSwgIj4gMjExMl9sYW5lMV9UVEFHR0NfTDAwMV9SMS5mYXN0cSIpKQoKYGBgCgpsZXRzIGNoZWNrIG91dCB0aGUgY29uY2F0ZW5hdGVkIGZpbGVzLCBqdXN0IHRvIG1ha2Ugc3VyZSB0aGV5IGxvb2sgb2suIAoKYGBge3J9CgpzZXR3ZCgifi9Eb2N1bWVudHMvQy12aXJnaW5pY2EtQlNTZXEiKQoKc3lzdGVtKCJoZWFkIC0xMCAyMTEyX2xhbmUxX0FDQUdUR19MMDAxX1IxLmZhc3RxIikKc3lzdGVtKCJ0YWlsIC0xMCAyMTEyX2xhbmUxX0FDQUdUR19MMDAxX1IxLmZhc3RxIikKCnN5c3RlbSgiaGVhZCAtMTAgMjExMl9sYW5lMV9BVENBQ0dfUjEuZmFzdHEiKQpzeXN0ZW0oInRhaWwgLTEwIDIxMTJfbGFuZTFfQVRDQUNHX1IxLmZhc3RxIikKCnN5c3RlbSgiaGVhZCAtMTAgMjExMl9sYW5lMV9DQUdBVENfUjEuZmFzdHEiKQpzeXN0ZW0oInRhaWwgLTEwIDIxMTJfbGFuZTFfQ0FHQVRDX1IxLmZhc3RxIikKCnN5c3RlbSgiaGVhZCAtMTAgMjExMl9sYW5lMV9HQ0NBQVRfTDAwMV9SMS5mYXN0cSIpCnN5c3RlbSgidGFpbCAtMTAgMjExMl9sYW5lMV9HQ0NBQVRfTDAwMV9SMS5mYXN0cSIpCgpzeXN0ZW0oImhlYWQgLTEwIDIxMTJfbGFuZTFfVFRBR0dDX0wwMDFfUjEuZmFzdHEiKQpzeXN0ZW0oInRhaWwgLTEwIDIxMTJfbGFuZTFfVFRBR0dDX0wwMDFfUjEuZmFzdHEiKQoKc3lzdGVtKCJoZWFkIC0xMCAyMTEyX2xhbmUxX1RHQUNDQV9MMDAxX1IxXzAwMS5mYXN0cSIpCnN5c3RlbSgidGFpbCAtMTAgMjExMl9sYW5lMV9UR0FDQ0FfTDAwMV9SMV8wMDEuZmFzdHEiKQoKYGBgCgpUaGF0IGFsbCBsb29rcyBsaWtlIGZhc3RxIGRhdGEsIHNvIHdlJ2xsIGFzc3VtZSBpdHMgb2shCgpQcmV2aW91c2x5IEkgZG93bmxvYWRlZCB0aGUgbmV3IHJlZmVyZW5jZSBnZW5vbWUgZnJvbSBOQ0JJIGxvY2F0ZWQgaGVyZSBodHRwczovL3d3dy5uY2JpLm5sbS5uaWguZ292L251Y2NvcmUvTVdQVDAwMDAwMDAwLjEvCmFuZCBzdG9yZWQgaXQgb24gT3dsLCBzbyBJIHNob3VsZCBnbyBmaXNoIG91dCB0aGUgcmVmZXJlbmNlIC5mYXN0YSBmaWxlLCBzbyBJIGhhdmUgc29tZXRoaW5nIGxvY2FsIHRvIHdvcmsgd2l0aC4KCgpgYGB7cn0KCnN5c3RlbSgibWtkaXIgL2hvbWUvc3JsYWIvRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxL2dlbm9tZS8iKQoKc3lzdGVtKCJzY3AgL2hvbWUvc3JsYWIvRG9jdW1lbnRzL293bC9zY2FwaGFwb2RhL1NlYW4vVmlyZ2luaWNhR2Vub21lL0NfdmlyZ2luaWNhXzEuMF9nZW5vbWUuZmFzdGEgfi9Eb2N1bWVudHMvQy12aXJnaW5pY2EtQlNTZXEvZ2Vub21lLyIpCgpzeXN0ZW0oImhlYWQgLTEwIH4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxL2dlbm9tZS9DX3ZpcmdpbmljYV8xLjBfZ2Vub21lLmZhc3RhIikKCmBgYAoKVGhhdCBsb29rcyBnb29kLiBTby4uLiBCaXNtYXJrPyBIb3BlZnVsbHkgdGhlIGdlbm9tZSBwcmVwIHN0ZXAgaXNuJ3QgdG9vIGNvbXB1dGVyIGludGVuc2l2ZSwgY3VycmVudGx5IG9uIEVtdSBJJ20gcnVubmluZyBhIDEyIGNvcmUgQmxhc3QsIGEgc2luZ2xlIGNvcmUgRGFtbWl0IGFubm90YXRpb24gcnVuLCBhbmQgZG93bmxvYWRpbmcvZ3ppcHBpbmcgYSBidW5jaCBvZiBTUkEgZGF0YSBzZXRzLiBOb3QgbXVjaCBwb3dlciB0byBzcGFyZSBvbiB0aGUgcG9vciBvbGQgYmlyZC4KCmBgYHtyfQoKc2V0d2QoIn4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxLyIpCgpzeXN0ZW0oIi9ob21lL3NoYXJlZC9CaXNtYXJrL2Jpc21hcmtfZ2Vub21lX3ByZXBhcmF0aW9uIH4vRG9jdW1lbnRzL0MtdmlyZ2luaWNhLUJTU2VxL2dlbm9tZS8iKQoKCgpgYGAKCgpXZWxsIHRoYXRzIGRvbmUhIEdvaW5nIHRvIHN3aXRjaCB0byBhIG5ldyBub3RlYm9vayBmb3IgcXVhbGl0eSB0cmltbWluZywgc2luY2UgaXQncyBhbGwga2luZHMgb2Ygc3BhbW15LiAKCg==