We’ve been doing some bisulfite mapping via bismark, and one issue we noticed was pretty abysmal mapping efficiency. Hollie noted around 3%, while I got 11% out of some data Steven provided. The developers published a paper here offering some suggestions to improve that, so I’m going to try those out.
First, they suggest running some form of quality control tool on the data. They use FastQC, but say that something like FastX toolkit, Prinseq, or another tool would work also. I’m going to use FastQC, just for simplicity’s sake.
install.packages("IRdisplay")
Installing package into ‘/home/sean/R/x86_64-pc-linux-gnu-library/3.3’
(as ‘lib’ is unspecified)
also installing the dependency ‘repr’
trying URL 'https://cran.rstudio.com/src/contrib/repr_0.10.tar.gz'
Content type 'application/x-gzip' length 21985 bytes (21 KB)
==================================================
downloaded 21 KB
trying URL 'https://cran.rstudio.com/src/contrib/IRdisplay_0.4.4.tar.gz'
Content type 'application/x-gzip' length 6435 bytes
==================================================
downloaded 6435 bytes
* installing *source* package ‘repr’ ...
** package ‘repr’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (repr)
* installing *source* package ‘IRdisplay’ ...
** package ‘IRdisplay’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (IRdisplay)
The downloaded source packages are in
‘/tmp/Rtmpt7ibts/downloaded_packages’
Lets run some fastQC on our data files!
time.fastqc <- proc.time()
system("gksudo /home/shared/fastQC/fastqc ~/Documents/Bismark-Mapping-Eff/Data/*.fastq")
time.fastqc2 <- proc.time() - time.fastqc
display_html(file = html_file_names[1])
cannot open file '1_ATCACG_L001_R1_001_fastqc.html': No such file or directoryError in file(con, "rb") : cannot open the connection
I can’t find a nice way to render html documents inside of an R notebook, so the output files from FastQC can be found here
The FastQC reports look pretty rough, with some indication of adapter contamination and low quality reads. We’ll run trimmomatic here, as it’s what Hollie did in her notebook. The arguments she used are Leading = 3, Trailing = 3, and MinLen = 20. I also pulled the TruSeq adapter sequences from there.
setwd("~/Documents/Bismark-Mapping-Eff/Data")
The working directory was changed to /home/sean/Documents/Bismark-Mapping-Eff/Data inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
temp.file.list <- list.files(path = "~/Documents/Bismark-Mapping-Eff/Data", pattern = "*.fastq")
trimo.time <- proc.time()
for(i in 1:length(temp.file.list)) {
system(paste0("TrimmomaticSE -threads 16 -phred33 ~/Documents/Bismark-Mapping-Eff/Data/", temp.file.list[i], " ~/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_", temp.file.list[i], " ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20"
))
}
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_10.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_10.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 113424382 Surviving: 104694897 (92.30%) Dropped: 8729485 (7.70%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_11.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_11.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 85167343 Surviving: 78581541 (92.27%) Dropped: 6585802 (7.73%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_12.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_12.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 87308544 Surviving: 80542929 (92.25%) Dropped: 6765615 (7.75%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_13.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_13.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 92858760 Surviving: 86407336 (93.05%) Dropped: 6451424 (6.95%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_14.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_14.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 140204208 Surviving: 128699413 (91.79%) Dropped: 11504795 (8.21%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_15.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_15.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 99431044 Surviving: 91725899 (92.25%) Dropped: 7705145 (7.75%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_16.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_16.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 107049378 Surviving: 99156555 (92.63%) Dropped: 7892823 (7.37%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_17.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_17.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 66494243 Surviving: 61272876 (92.15%) Dropped: 5221367 (7.85%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_18.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_18.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 78526645 Surviving: 73350026 (93.41%) Dropped: 5176619 (6.59%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_1.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_1.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 74152340 Surviving: 68942407 (92.97%) Dropped: 5209933 (7.03%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_2.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_2.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 73488249 Surviving: 67909800 (92.41%) Dropped: 5578449 (7.59%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_3.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_3.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 76394852 Surviving: 70652643 (92.48%) Dropped: 5742209 (7.52%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_4.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_4.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 64145277 Surviving: 59759440 (93.16%) Dropped: 4385837 (6.84%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_5.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_5.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 91405786 Surviving: 85154367 (93.16%) Dropped: 6251419 (6.84%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_6.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_6.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 72057411 Surviving: 67036403 (93.03%) Dropped: 5021008 (6.97%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_7.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_7.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 79104781 Surviving: 73263095 (92.62%) Dropped: 5841686 (7.38%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_8.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_8.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 62466580 Surviving: 57683509 (92.34%) Dropped: 4783071 (7.66%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments:
-threads 16 -phred33 /home/sean/Documents/Bismark-Mapping-Eff/Data/zr1394_9.fastq /home/sean/Documents/Bismark-Mapping-Eff/Cleaned/cleaned_zr1394_9.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 121559470 Surviving: 112917933 (92.89%) Dropped: 8641537 (7.11%)
TrimmomaticSE: Completed successfully
trimo.time2 <- proc.time() - trimo.time
print(trimo.time2)
user system elapsed
8029.268 714.868 3187.236
ad <- 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
galore.time <- proc.time()
for(i in 1:length(temp.file.list)) {
system(paste0("/home/shared/trimgalore/trim_galore --fastqc -a '", toString(ad), "' -q 20 -clip_R1 3 -three_prime_clip_R1 3 --length 20 ", temp.file.list[i], " -o ~/Documents/Bismark-Mapping-Eff/Galore/"))
}
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_10.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_10.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_10_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_10.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
90000000 sequences processed
100000000 sequences processed
110000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_10.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 2460.02 s (22 us/read; 2.77 M reads/minute).
=== Summary ===
Total reads processed: 113,424,382
Reads with adapters: 43,185,182 (38.1%)
Reads written (passing filters): 113,424,382 (100.0%)
Total basepairs processed: 5,687,436,918 bp
Quality-trimmed: 9,039,182 bp (0.2%)
Total written (filtered): 5,632,062,338 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 43185182 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 44.7%
C: 10.5%
G: 4.9%
T: 39.4%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 40985747 28356095.5 0 40985747
2 1461680 7089023.9 0 1461680
3 591685 1772256.0 0 591685
4 117427 443064.0 0 117427
5 15243 110766.0 0 15243
6 1954 27691.5 0 1954
7 5798 6922.9 0 5798
8 1745 1730.7 0 1745
9 934 432.7 0 56 878
10 2320 108.2 1 8 2312
11 394 27.0 1 21 373
12 215 6.8 1 0 215
13 37 1.7 1 0 37
14 3 0.4 1 0 3
RUN STATISTICS FOR INPUT FILE: zr1394_10.fastq
=============================================
113424382 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 279725 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_11.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_11.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_11_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_11.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_11.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1819.71 s (21 us/read; 2.81 M reads/minute).
=== Summary ===
Total reads processed: 85,167,343
Reads with adapters: 32,665,400 (38.4%)
Reads written (passing filters): 85,167,343 (100.0%)
Total basepairs processed: 4,275,277,706 bp
Quality-trimmed: 5,973,003 bp (0.1%)
Total written (filtered): 4,234,832,504 bp (99.1%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 32665400 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.2%
C: 10.9%
G: 3.8%
T: 38.7%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 31424473 21291835.8 0 31424473
2 805245 5322958.9 0 805245
3 346186 1330739.7 0 346186
4 72175 332684.9 0 72175
5 9699 83171.2 0 9699
6 1183 20792.8 0 1183
7 2829 5198.2 0 2829
8 1044 1299.6 0 1044
9 526 324.9 0 64 462
10 1302 81.2 1 7 1295
11 446 20.3 1 48 398
12 229 5.1 1 2 227
13 61 1.3 1 0 61
14 1 0.3 1 0 1
15 1 0.1 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_11.fastq
=============================================
85167343 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 198131 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_12.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_12.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_12_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_12.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_12.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1863.91 s (21 us/read; 2.81 M reads/minute).
=== Summary ===
Total reads processed: 87,308,544
Reads with adapters: 35,302,633 (40.4%)
Reads written (passing filters): 87,308,544 (100.0%)
Total basepairs processed: 4,386,380,593 bp
Quality-trimmed: 5,918,621 bp (0.1%)
Total written (filtered): 4,343,142,503 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 35302633 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.7%
C: 10.9%
G: 3.4%
T: 38.5%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 33914599 21827136.0 0 33914599
2 905011 5456784.0 0 905011
3 376952 1364196.0 0 376952
4 89340 341049.0 0 89340
5 8923 85262.2 0 8923
6 1100 21315.6 0 1100
7 3137 5328.9 0 3137
8 1183 1332.2 0 1183
9 612 333.1 0 44 568
10 1280 83.3 1 11 1269
11 305 20.8 1 28 277
12 155 5.2 1 3 152
13 33 1.3 1 0 33
14 3 0.3 1 0 3
RUN STATISTICS FOR INPUT FILE: zr1394_12.fastq
=============================================
87308544 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 203131 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_13.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_13.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_13_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_13.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
90000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_13.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 2035.10 s (22 us/read; 2.74 M reads/minute).
=== Summary ===
Total reads processed: 92,858,760
Reads with adapters: 36,482,833 (39.3%)
Reads written (passing filters): 92,858,760 (100.0%)
Total basepairs processed: 4,665,362,708 bp
Quality-trimmed: 5,844,545 bp (0.1%)
Total written (filtered): 4,620,952,841 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 36482833 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.7%
C: 10.9%
G: 3.7%
T: 38.3%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 35053970 23214690.0 0 35053970
2 928179 5803672.5 0 928179
3 396678 1450918.1 0 396678
4 83910 362729.5 0 83910
5 10795 90682.4 0 10795
6 1211 22670.6 0 1211
7 3540 5667.6 0 3540
8 1351 1416.9 0 1351
9 731 354.2 0 51 680
10 1624 88.6 1 8 1616
11 526 22.1 1 54 472
12 254 5.5 1 1 253
13 61 1.4 1 0 61
14 1 0.3 1 0 1
15 1 0.1 1 0 1
16 1 0.0 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_13.fastq
=============================================
92858760 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 193277 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_14.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_14.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_14_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_14.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
90000000 sequences processed
100000000 sequences processed
110000000 sequences processed
120000000 sequences processed
130000000 sequences processed
140000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_14.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 3019.97 s (22 us/read; 2.79 M reads/minute).
=== Summary ===
Total reads processed: 140,204,208
Reads with adapters: 55,361,537 (39.5%)
Reads written (passing filters): 140,204,208 (100.0%)
Total basepairs processed: 7,044,237,190 bp
Quality-trimmed: 9,772,210 bp (0.1%)
Total written (filtered): 6,976,341,780 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 55361537 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 47.1%
C: 10.1%
G: 3.1%
T: 39.2%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 53462724 35051052.0 0 53462724
2 1239767 8762763.0 0 1239767
3 514944 2190690.8 0 514944
4 118938 547672.7 0 118938
5 13830 136918.2 0 13830
6 1714 34229.5 0 1714
7 4371 8557.4 0 4371
8 1295 2139.3 0 1295
9 875 534.8 0 61 814
10 2164 133.7 1 13 2151
11 592 33.4 1 52 540
12 260 8.4 1 1 259
13 62 2.1 1 0 62
14 1 0.5 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_14.fastq
=============================================
140204208 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 333547 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_15.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_15.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_15_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_15.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
90000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_15.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 2136.95 s (21 us/read; 2.79 M reads/minute).
=== Summary ===
Total reads processed: 99,431,044
Reads with adapters: 38,574,757 (38.8%)
Reads written (passing filters): 99,431,044 (100.0%)
Total basepairs processed: 4,990,166,643 bp
Quality-trimmed: 7,062,930 bp (0.1%)
Total written (filtered): 4,942,286,577 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 38574757 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.3%
C: 10.9%
G: 3.8%
T: 38.5%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 37048294 24857761.0 0 37048294
2 977999 6214440.2 0 977999
3 434825 1553610.1 0 434825
4 91202 388402.5 0 91202
5 12431 97100.6 0 12431
6 1401 24275.2 0 1401
7 3528 6068.8 0 3528
8 1486 1517.2 0 1486
9 797 379.3 0 67 730
10 1889 94.8 1 19 1870
11 578 23.7 1 61 517
12 260 5.9 1 2 258
13 64 1.5 1 0 64
14 2 0.4 1 0 2
15 1 0.1 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_15.fastq
=============================================
99431044 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 246071 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_16.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_16.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_16_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_16.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
90000000 sequences processed
100000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_16.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 2326.77 s (22 us/read; 2.76 M reads/minute).
=== Summary ===
Total reads processed: 107,049,378
Reads with adapters: 41,652,769 (38.9%)
Reads written (passing filters): 107,049,378 (100.0%)
Total basepairs processed: 5,374,514,534 bp
Quality-trimmed: 7,254,396 bp (0.1%)
Total written (filtered): 5,323,303,062 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 41652769 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.2%
C: 10.4%
G: 3.7%
T: 39.2%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 40086494 26762344.5 0 40086494
2 1001458 6690586.1 0 1001458
3 446545 1672646.5 0 446545
4 94696 418161.6 0 94696
5 13189 104540.4 0 13189
6 1476 26135.1 0 1476
7 4056 6533.8 0 4056
8 1453 1633.4 0 1453
9 836 408.4 0 88 748
10 1717 102.1 1 12 1705
11 513 25.5 1 51 462
12 277 6.4 1 1 276
13 57 1.6 1 0 57
14 2 0.4 1 0 2
RUN STATISTICS FOR INPUT FILE: zr1394_16.fastq
=============================================
107049378 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 238483 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_17.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_17.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_17_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_17.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_17.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1443.84 s (22 us/read; 2.76 M reads/minute).
=== Summary ===
Total reads processed: 66,494,243
Reads with adapters: 26,051,518 (39.2%)
Reads written (passing filters): 66,494,243 (100.0%)
Total basepairs processed: 3,331,739,557 bp
Quality-trimmed: 4,841,602 bp (0.1%)
Total written (filtered): 3,299,106,602 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 26051518 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 45.5%
C: 11.6%
G: 4.5%
T: 37.9%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 24882822 16623560.8 0 24882822
2 729325 4155890.2 0 729325
3 355256 1038972.5 0 355256
4 65331 259743.1 0 65331
5 9989 64935.8 0 9989
6 1186 16233.9 0 1186
7 2773 4058.5 0 2773
8 1590 1014.6 0 1590
9 646 253.7 0 83 563
10 1447 63.4 1 23 1424
11 671 15.9 1 137 534
12 349 4.0 1 6 343
13 127 1.0 1 0 127
14 3 0.2 1 0 3
15 2 0.1 1 0 2
21 1 0.0 2 0 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_17.fastq
=============================================
66494243 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 168200 (0.3%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_18.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_18.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_18_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_18.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_18.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1710.81 s (22 us/read; 2.75 M reads/minute).
=== Summary ===
Total reads processed: 78,526,645
Reads with adapters: 31,435,977 (40.0%)
Reads written (passing filters): 78,526,645 (100.0%)
Total basepairs processed: 3,950,526,343 bp
Quality-trimmed: 4,519,798 bp (0.1%)
Total written (filtered): 3,912,747,519 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 31435977 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 47.1%
C: 10.9%
G: 3.5%
T: 38.0%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 30174789 19631661.2 0 30174789
2 826848 4907915.3 0 826848
3 342991 1226978.8 0 342991
4 76595 306744.7 0 76595
5 7845 76686.2 0 7845
6 993 19171.5 0 993
7 2609 4792.9 0 2609
8 913 1198.2 0 913
9 534 299.6 0 39 495
10 1171 74.9 1 15 1156
11 410 18.7 1 38 372
12 216 4.7 1 3 213
13 62 1.2 1 0 62
14 1 0.3 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_18.fastq
=============================================
78526645 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 141402 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_1.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_1_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_1.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1619.13 s (22 us/read; 2.75 M reads/minute).
=== Summary ===
Total reads processed: 74,152,340
Reads with adapters: 29,882,662 (40.3%)
Reads written (passing filters): 74,152,340 (100.0%)
Total basepairs processed: 3,723,478,759 bp
Quality-trimmed: 4,368,927 bp (0.1%)
Total written (filtered): 3,687,466,988 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 29882662 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.5%
C: 10.9%
G: 3.8%
T: 38.4%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 28696845 18538085.0 0 28696845
2 743935 4634521.2 0 743935
3 352757 1158630.3 0 352757
4 72520 289657.6 0 72520
5 8672 72414.4 0 8672
6 1069 18103.6 0 1069
7 2624 4525.9 0 2624
8 1087 1131.5 0 1087
9 572 282.9 0 78 494
10 1340 70.7 1 32 1308
11 646 17.7 1 63 583
12 460 4.4 1 7 453
13 124 1.1 1 0 124
14 11 0.3 1 0 11
RUN STATISTICS FOR INPUT FILE: zr1394_1.fastq
=============================================
74152340 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 173839 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_2.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_2.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_2_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_2.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_2.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1610.50 s (22 us/read; 2.74 M reads/minute).
=== Summary ===
Total reads processed: 73,488,249
Reads with adapters: 29,581,400 (40.3%)
Reads written (passing filters): 73,488,249 (100.0%)
Total basepairs processed: 3,690,821,482 bp
Quality-trimmed: 4,676,110 bp (0.1%)
Total written (filtered): 3,654,734,368 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 29581400 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.4%
C: 10.9%
G: 3.9%
T: 38.4%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 28338793 18372062.2 0 28338793
2 790537 4593015.6 0 790537
3 359878 1148253.9 0 359878
4 76003 287063.5 0 76003
5 8002 71765.9 0 8002
6 1107 17941.5 0 1107
7 2913 4485.4 0 2913
8 1090 1121.3 0 1090
9 661 280.3 0 49 612
10 1375 70.1 1 22 1353
11 574 17.5 1 67 507
12 365 4.4 1 2 363
13 95 1.1 1 0 95
14 5 0.3 1 0 5
15 2 0.1 1 0 2
RUN STATISTICS FOR INPUT FILE: zr1394_2.fastq
=============================================
73488249 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 173387 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_3.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_3.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_3_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_3.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_3.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1666.13 s (22 us/read; 2.75 M reads/minute).
=== Summary ===
Total reads processed: 76,394,852
Reads with adapters: 30,845,578 (40.4%)
Reads written (passing filters): 76,394,852 (100.0%)
Total basepairs processed: 3,838,964,335 bp
Quality-trimmed: 4,690,922 bp (0.1%)
Total written (filtered): 3,801,713,441 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 30845578 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.7%
C: 10.5%
G: 3.4%
T: 38.9%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 29677699 19098713.0 0 29677699
2 743644 4774678.2 0 743644
3 338034 1193669.6 0 338034
4 72016 298417.4 0 72016
5 7418 74604.3 0 7418
6 914 18651.1 0 914
7 2454 4662.8 0 2454
8 905 1165.7 0 905
9 474 291.4 0 46 428
10 1168 72.9 1 30 1138
11 452 18.2 1 35 417
12 301 4.6 1 5 296
13 90 1.1 1 0 90
14 8 0.3 1 0 8
15 1 0.1 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_3.fastq
=============================================
76394852 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 170874 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_4.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_4.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_4_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_4.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_4.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1379.30 s (22 us/read; 2.79 M reads/minute).
=== Summary ===
Total reads processed: 64,145,277
Reads with adapters: 25,621,930 (39.9%)
Reads written (passing filters): 64,145,277 (100.0%)
Total basepairs processed: 3,223,836,346 bp
Quality-trimmed: 3,945,438 bp (0.1%)
Total written (filtered): 3,192,787,814 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 25621930 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.9%
C: 10.8%
G: 3.6%
T: 38.2%
none/other: 0.5%
Overview of removed sequences
length count expect max.err error counts
1 24609640 16036319.2 0 24609640
2 650690 4009079.8 0 650690
3 287659 1002270.0 0 287659
4 60595 250567.5 0 60595
5 7059 62641.9 0 7059
6 941 15660.5 0 941
7 2198 3915.1 0 2198
8 878 978.8 0 878
9 508 244.7 0 48 460
10 1022 61.2 1 17 1005
11 402 15.3 1 56 346
12 249 3.8 1 6 243
13 83 1.0 1 0 83
14 5 0.2 1 0 5
15 1 0.1 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_4.fastq
=============================================
64145277 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 141561 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_5.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_5.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_5_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_5.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
90000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_5.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 2020.30 s (22 us/read; 2.71 M reads/minute).
=== Summary ===
Total reads processed: 91,405,786
Reads with adapters: 37,527,806 (41.1%)
Reads written (passing filters): 91,405,786 (100.0%)
Total basepairs processed: 4,589,647,758 bp
Quality-trimmed: 5,489,653 bp (0.1%)
Total written (filtered): 4,544,267,493 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 37527806 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.3%
C: 11.3%
G: 4.0%
T: 38.1%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 35935034 22851446.5 0 35935034
2 995142 5712861.6 0 995142
3 478719 1428215.4 0 478719
4 99132 357053.9 0 99132
5 9653 89263.5 0 9653
6 1304 22315.9 0 1304
7 3242 5579.0 0 3242
8 1678 1394.7 0 1678
9 913 348.7 0 107 806
10 1542 87.2 1 30 1512
11 771 21.8 1 99 672
12 515 5.4 1 6 509
13 152 1.4 1 0 152
14 7 0.3 1 0 7
15 2 0.1 1 0 2
RUN STATISTICS FOR INPUT FILE: zr1394_5.fastq
=============================================
91405786 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 211704 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_6.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_6.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_6_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_6.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_6.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1567.29 s (22 us/read; 2.76 M reads/minute).
=== Summary ===
Total reads processed: 72,057,411
Reads with adapters: 29,789,932 (41.3%)
Reads written (passing filters): 72,057,411 (100.0%)
Total basepairs processed: 3,623,922,208 bp
Quality-trimmed: 4,012,188 bp (0.1%)
Total written (filtered): 3,588,483,613 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 29789932 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 47.4%
C: 10.7%
G: 3.2%
T: 38.3%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 28664478 18014352.8 0 28664478
2 730404 4503588.2 0 730404
3 310812 1125897.0 0 310812
4 72410 281474.3 0 72410
5 5841 70368.6 0 5841
6 872 17592.1 0 872
7 1957 4398.0 0 1957
8 803 1099.5 0 803
9 475 274.9 0 45 430
10 1015 68.7 1 21 994
11 443 17.2 1 45 398
12 312 4.3 1 1 311
13 98 1.1 1 0 98
14 11 0.3 1 0 11
15 1 0.1 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_6.fastq
=============================================
72057411 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 154325 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_7.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_7.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_7_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_7.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_7.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1715.77 s (22 us/read; 2.77 M reads/minute).
=== Summary ===
Total reads processed: 79,104,781
Reads with adapters: 31,719,551 (40.1%)
Reads written (passing filters): 79,104,781 (100.0%)
Total basepairs processed: 3,970,724,518 bp
Quality-trimmed: 5,133,460 bp (0.1%)
Total written (filtered): 3,932,020,040 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 31719551 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.3%
C: 10.9%
G: 3.8%
T: 38.6%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 30468196 19776195.2 0 30468196
2 794484 4944048.8 0 794484
3 362213 1236012.2 0 362213
4 76147 309003.1 0 76147
5 9400 77250.8 0 9400
6 1192 19312.7 0 1192
7 3028 4828.2 0 3028
8 1479 1207.0 0 1479
9 729 301.8 0 97 632
10 1371 75.4 1 25 1346
11 720 18.9 1 84 636
12 463 4.7 1 8 455
13 113 1.2 1 0 113
14 10 0.3 1 0 10
15 5 0.1 1 0 5
16 1 0.0 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_7.fastq
=============================================
79104781 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 193543 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_8.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_8.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_8_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_8.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_8.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 1361.42 s (22 us/read; 2.75 M reads/minute).
=== Summary ===
Total reads processed: 62,466,580
Reads with adapters: 24,912,705 (39.9%)
Reads written (passing filters): 62,466,580 (100.0%)
Total basepairs processed: 3,138,210,719 bp
Quality-trimmed: 4,000,171 bp (0.1%)
Total written (filtered): 3,107,836,344 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 24912705 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 46.9%
C: 10.6%
G: 3.7%
T: 38.3%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 23913925 15616645.0 0 23913925
2 638385 3904161.2 0 638385
3 289073 976040.3 0 289073
4 58745 244010.1 0 58745
5 6741 61002.5 0 6741
6 800 15250.6 0 800
7 2225 3812.7 0 2225
8 725 953.2 0 725
9 423 238.3 0 36 387
10 1007 59.6 1 24 983
11 394 14.9 1 41 353
12 196 3.7 1 1 195
13 58 0.9 1 0 58
14 7 0.2 1 0 7
15 1 0.1 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_8.fastq
=============================================
62466580 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 140685 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
1.12
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to '/home/sean/Documents/Bismark-Mapping-Eff/Galore/zr1394_9.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: zr1394_9.fastq
Trimming mode: single-end
Trim Galore version: 0.4.2
Cutadapt version: 1.12
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' ()
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp
All Read 1 sequences will be trimmed by 3 bp from their 5' end to avoid poor qualities or biases
All Read 1 sequences will be trimmed by 3 bp from their 3' end to avoid poor qualities or biases
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to zr1394_9_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' from file zr1394_9.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
80000000 sequences processed
90000000 sequences processed
100000000 sequences processed
110000000 sequences processed
120000000 sequences processed
This is cutadapt 1.12 with Python 2.7.12
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA zr1394_9.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 2668.85 s (22 us/read; 2.73 M reads/minute).
=== Summary ===
Total reads processed: 121,559,470
Reads with adapters: 49,649,167 (40.8%)
Reads written (passing filters): 121,559,470 (100.0%)
Total basepairs processed: 6,112,788,269 bp
Quality-trimmed: 7,092,838 bp (0.1%)
Total written (filtered): 6,053,405,265 bp (99.0%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3'; Length: 34; Trimmed: 49649167 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3
Bases preceding removed adapters:
A: 47.1%
C: 10.4%
G: 3.1%
T: 38.9%
none/other: 0.4%
Overview of removed sequences
length count expect max.err error counts
1 47823355 30389867.5 0 47823355
2 1190767 7597466.9 0 1190767
3 500538 1899366.7 0 500538
4 114801 474841.7 0 114801
5 11070 118710.4 0 11070
6 1272 29677.6 0 1272
7 3350 7419.4 0 3350
8 1216 1854.9 0 1216
9 720 463.7 0 69 651
10 1362 115.9 1 10 1352
11 444 29.0 1 43 401
12 222 7.2 1 2 220
13 49 1.8 1 0 49
14 1 0.5 1 0 1
RUN STATISTICS FOR INPUT FILE: zr1394_9.fastq
=============================================
121559470 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 236908 (0.2%)
>>> Now running FastQC on the data <<<
Can't exec "fastqc": No such file or directory at /home/shared/trimgalore/trim_galore line 771.
galore.time2 <- proc.time - galore.time
Error in proc.time - galore.time :
non-numeric argument to binary operator
Now we’ll re-run FastQC, to check if things have changed
time.clean.fastqc <- proc.time()
system("/home/shared/fastQC/fastqc ~/Documents/Bismark-Mapping-Eff/Cleaned/*.fastq")
time.clean.fastqc2 <- proc.time() - time.clean.fastqc
print(time.clean.fastqc2)
The Babraham paper suggests trimming using Trim Galore with a –trim1 argument, but this is for paired end reads, and we have single end reads here, so that isn’t applicable to us.
update after one night running: I apparently don’t have FastQC pathed propery for the –fastqc call to work in Trim Galore. Oops. So I guess we get to run FastQC on everything manually. Ah well.
setwd("~/Documents/Bismark-Mapping-Eff/Cleaned")
The working directory was changed to /home/sean/Documents/Bismark-Mapping-Eff/Cleaned inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
system("/home/shared/fastQC/fastqc ~/Documents/Bismark-Mapping-Eff/Cleaned/*.fastq")
Started analysis of cleaned_zr1394_1.fastq
Approx 5% complete for cleaned_zr1394_1.fastq
Approx 10% complete for cleaned_zr1394_1.fastq
Approx 15% complete for cleaned_zr1394_1.fastq
Approx 20% complete for cleaned_zr1394_1.fastq
Approx 25% complete for cleaned_zr1394_1.fastq
Approx 30% complete for cleaned_zr1394_1.fastq
Approx 35% complete for cleaned_zr1394_1.fastq
Approx 40% complete for cleaned_zr1394_1.fastq
Approx 45% complete for cleaned_zr1394_1.fastq
Approx 50% complete for cleaned_zr1394_1.fastq
Approx 55% complete for cleaned_zr1394_1.fastq
Approx 60% complete for cleaned_zr1394_1.fastq
Approx 65% complete for cleaned_zr1394_1.fastq
Approx 70% complete for cleaned_zr1394_1.fastq
Approx 75% complete for cleaned_zr1394_1.fastq
Approx 80% complete for cleaned_zr1394_1.fastq
Approx 85% complete for cleaned_zr1394_1.fastq
Approx 90% complete for cleaned_zr1394_1.fastq
Approx 95% complete for cleaned_zr1394_1.fastq
Analysis complete for cleaned_zr1394_1.fastq
Started analysis of cleaned_zr1394_10.fastq
Approx 5% complete for cleaned_zr1394_10.fastq
Approx 10% complete for cleaned_zr1394_10.fastq
Approx 15% complete for cleaned_zr1394_10.fastq
Approx 20% complete for cleaned_zr1394_10.fastq
Approx 25% complete for cleaned_zr1394_10.fastq
Approx 30% complete for cleaned_zr1394_10.fastq
Approx 35% complete for cleaned_zr1394_10.fastq
Approx 40% complete for cleaned_zr1394_10.fastq
Approx 45% complete for cleaned_zr1394_10.fastq
Approx 50% complete for cleaned_zr1394_10.fastq
Approx 55% complete for cleaned_zr1394_10.fastq
Approx 60% complete for cleaned_zr1394_10.fastq
Approx 65% complete for cleaned_zr1394_10.fastq
Approx 70% complete for cleaned_zr1394_10.fastq
Approx 75% complete for cleaned_zr1394_10.fastq
Approx 80% complete for cleaned_zr1394_10.fastq
Approx 85% complete for cleaned_zr1394_10.fastq
Approx 90% complete for cleaned_zr1394_10.fastq
Approx 95% complete for cleaned_zr1394_10.fastq
Analysis complete for cleaned_zr1394_10.fastq
Started analysis of cleaned_zr1394_11.fastq
Approx 5% complete for cleaned_zr1394_11.fastq
Approx 10% complete for cleaned_zr1394_11.fastq
Approx 15% complete for cleaned_zr1394_11.fastq
Approx 20% complete for cleaned_zr1394_11.fastq
Approx 25% complete for cleaned_zr1394_11.fastq
Approx 30% complete for cleaned_zr1394_11.fastq
Approx 35% complete for cleaned_zr1394_11.fastq
Approx 40% complete for cleaned_zr1394_11.fastq
Approx 45% complete for cleaned_zr1394_11.fastq
Approx 50% complete for cleaned_zr1394_11.fastq
Approx 55% complete for cleaned_zr1394_11.fastq
Approx 60% complete for cleaned_zr1394_11.fastq
Approx 65% complete for cleaned_zr1394_11.fastq
Approx 70% complete for cleaned_zr1394_11.fastq
Approx 75% complete for cleaned_zr1394_11.fastq
Approx 80% complete for cleaned_zr1394_11.fastq
Approx 85% complete for cleaned_zr1394_11.fastq
Approx 90% complete for cleaned_zr1394_11.fastq
Approx 95% complete for cleaned_zr1394_11.fastq
Analysis complete for cleaned_zr1394_11.fastq
Started analysis of cleaned_zr1394_12.fastq
Approx 5% complete for cleaned_zr1394_12.fastq
Approx 10% complete for cleaned_zr1394_12.fastq
Approx 15% complete for cleaned_zr1394_12.fastq
Approx 20% complete for cleaned_zr1394_12.fastq
Approx 25% complete for cleaned_zr1394_12.fastq
Approx 30% complete for cleaned_zr1394_12.fastq
Approx 35% complete for cleaned_zr1394_12.fastq
Approx 40% complete for cleaned_zr1394_12.fastq
Approx 45% complete for cleaned_zr1394_12.fastq
Approx 50% complete for cleaned_zr1394_12.fastq
Approx 55% complete for cleaned_zr1394_12.fastq
Approx 60% complete for cleaned_zr1394_12.fastq
Approx 65% complete for cleaned_zr1394_12.fastq
Approx 70% complete for cleaned_zr1394_12.fastq
Approx 75% complete for cleaned_zr1394_12.fastq
Approx 80% complete for cleaned_zr1394_12.fastq
Approx 85% complete for cleaned_zr1394_12.fastq
Approx 90% complete for cleaned_zr1394_12.fastq
Approx 95% complete for cleaned_zr1394_12.fastq
Analysis complete for cleaned_zr1394_12.fastq
Started analysis of cleaned_zr1394_13.fastq
Approx 5% complete for cleaned_zr1394_13.fastq
Approx 10% complete for cleaned_zr1394_13.fastq
Approx 15% complete for cleaned_zr1394_13.fastq
Approx 20% complete for cleaned_zr1394_13.fastq
Approx 25% complete for cleaned_zr1394_13.fastq
Approx 30% complete for cleaned_zr1394_13.fastq
Approx 35% complete for cleaned_zr1394_13.fastq
Approx 40% complete for cleaned_zr1394_13.fastq
Approx 45% complete for cleaned_zr1394_13.fastq
Approx 50% complete for cleaned_zr1394_13.fastq
Approx 55% complete for cleaned_zr1394_13.fastq
Approx 60% complete for cleaned_zr1394_13.fastq
Approx 65% complete for cleaned_zr1394_13.fastq
Approx 70% complete for cleaned_zr1394_13.fastq
Approx 75% complete for cleaned_zr1394_13.fastq
Approx 80% complete for cleaned_zr1394_13.fastq
Approx 85% complete for cleaned_zr1394_13.fastq
Approx 90% complete for cleaned_zr1394_13.fastq
Approx 95% complete for cleaned_zr1394_13.fastq
Analysis complete for cleaned_zr1394_13.fastq
Started analysis of cleaned_zr1394_14.fastq
Approx 5% complete for cleaned_zr1394_14.fastq
Approx 10% complete for cleaned_zr1394_14.fastq
Approx 15% complete for cleaned_zr1394_14.fastq
Approx 20% complete for cleaned_zr1394_14.fastq
Approx 25% complete for cleaned_zr1394_14.fastq
Approx 30% complete for cleaned_zr1394_14.fastq
Approx 35% complete for cleaned_zr1394_14.fastq
Approx 40% complete for cleaned_zr1394_14.fastq
Approx 45% complete for cleaned_zr1394_14.fastq
Approx 50% complete for cleaned_zr1394_14.fastq
Approx 55% complete for cleaned_zr1394_14.fastq
Approx 60% complete for cleaned_zr1394_14.fastq
Approx 65% complete for cleaned_zr1394_14.fastq
Approx 70% complete for cleaned_zr1394_14.fastq
Approx 75% complete for cleaned_zr1394_14.fastq
Approx 80% complete for cleaned_zr1394_14.fastq
Approx 85% complete for cleaned_zr1394_14.fastq
Approx 90% complete for cleaned_zr1394_14.fastq
Approx 95% complete for cleaned_zr1394_14.fastq
Analysis complete for cleaned_zr1394_14.fastq
Started analysis of cleaned_zr1394_15.fastq
Approx 5% complete for cleaned_zr1394_15.fastq
Approx 10% complete for cleaned_zr1394_15.fastq
Approx 15% complete for cleaned_zr1394_15.fastq
Approx 20% complete for cleaned_zr1394_15.fastq
Approx 25% complete for cleaned_zr1394_15.fastq
Approx 30% complete for cleaned_zr1394_15.fastq
Approx 35% complete for cleaned_zr1394_15.fastq
Approx 40% complete for cleaned_zr1394_15.fastq
Approx 45% complete for cleaned_zr1394_15.fastq
Approx 50% complete for cleaned_zr1394_15.fastq
Approx 55% complete for cleaned_zr1394_15.fastq
Approx 60% complete for cleaned_zr1394_15.fastq
Approx 65% complete for cleaned_zr1394_15.fastq
Approx 70% complete for cleaned_zr1394_15.fastq
Approx 75% complete for cleaned_zr1394_15.fastq
Approx 80% complete for cleaned_zr1394_15.fastq
Approx 85% complete for cleaned_zr1394_15.fastq
Approx 90% complete for cleaned_zr1394_15.fastq
Approx 95% complete for cleaned_zr1394_15.fastq
Analysis complete for cleaned_zr1394_15.fastq
Started analysis of cleaned_zr1394_16.fastq
Approx 5% complete for cleaned_zr1394_16.fastq
Approx 10% complete for cleaned_zr1394_16.fastq
Approx 15% complete for cleaned_zr1394_16.fastq
Approx 20% complete for cleaned_zr1394_16.fastq
Approx 25% complete for cleaned_zr1394_16.fastq
Moving on to the bismark steps. First, we prep the reference genome.
#system("/home/shared/Bismark/bismark_genome_preparation ~/Documents/Bismark-Mapping-Eff/Bowtie1/")
#system("/home/shared/Bismark/bismark_genome_preparation --bowtie2 ~/Documents/Bismark-Mapping-Eff/Bowtie2/")
Now we’re going to run the Bismark aligner a few different ways. I’m trying both Trimmomatic and Trim Galore trimmed files, with N arguments of 0 and 1 for Bismark. N indiates the number of high-quality mismatches, and may not have a measurable effect on mapping efficiencies, but it’s worth a shot I suppose?
# trimmo.file.names <- list.files(path = "~/Documents/Bismark-Mapping-Eff/Cleaned", pattern = "*.fastq")
#
# galore.file.names <- list.files(path = "~/Documents/Bismark-Mapping-Eff/Galore", pattern = "*.fastq")
#
# for(i in 1:nrow(trimmo.file.names)) {
#
# system(paste0("/home/shared/Bismark/bismark -n 0 --genome ~/Documents/Bismark-Mapping-Eff/Bowtie2/ ~/Documents/Bismark-Mapping-Eff/Cleaned/", trimmo.file.names[i]," --output_dir ~/Documents/Bismark-Mapping-Eff/Trimmo-Out-0 "))
#
# }
#
# for(i in 1:nrow(trimmo.file.names)) {
#
# system(paste0("/home/shared/Bismark/bismark -n 1 --genome ~/Documents/Bismark-Mapping-Eff/Bowtie2/ ~/Documents/Bismark-Mapping-Eff/Cleaned/", trimmo.file.names[i]," --output_dir ~/Documents/Bismark-Mapping-Eff/Trimmo-Out-1 "))
#
# }
#
# for(i in 1:nrow(galore.file.names)) {
#
# system(paste0("/home/shared/Bismark/bismark -n 0 --genome ~/Documents/Bismark-Mapping-Eff/Bowtie2/ ~/Documents/Bismark-Mapping-Eff/Cleaned/", galore.file.names[i]," --output_dir ~/Documents/Bismark-Mapping-Eff/Galore-Out-0 "))
#
# }
#
# for(i in 1:nrow(galore.file.names)) {
#
# system(paste0("/home/shared/Bismark/bismark -n 1 --genome ~/Documents/Bismark-Mapping-Eff/Bowtie2/ ~/Documents/Bismark-Mapping-Eff/Cleaned/", galore.file.names[i]," --output_dir ~/Documents/Bismark-Mapping-Eff/Galore-Out-1 "))
#
# }
file.names.list <- list.files(path = "~/Documents/test", pattern = "*.fastq")
for(i in 1:length(file.names.list)) {
time.0 <- proc.time()
system(paste0("/home/shared/Bismark/bismark --multicore 8 -n 0 --genome ~/Documents/Bismark-Mapping-Eff/Bowtie2/ ~/Documents/test/", file.names.list[i], " --output_dir ~/Documents/test/"))
time.0 <- time.0 - proc.time()
print(time.0)
time.1 <- proc.time()
system(paste0("/home/shared/Bismark/bismark --multicore 8 -n 1 --genome ~/Documents/Bismark-Mapping-Eff/Bowtie2/ ~/Documents/test/", file.names.list[i], " --output_dir ~/Documents/test/"))
time.1 <- time.1 - proc.time()
print(time.1)
time.2 <- proc.time()
system(paste0("/home/shared/Bismark/bismark --multicore 8 -n 2 --genome ~/Documents/Bismark-Mapping-Eff/Bowtie2/ ~/Documents/test/", file.names.list[i], " --output_dir ~/Documents/test/"))
time.2 <- time.2 - proc.time()
print(time.2)
}
for(i in 1:length(file.names.list)) {
system(paste0("bsmap -a ~/Documents/test/", file.names.list[i], " -d ~/Documents/Bismark-Mapping-Eff/Bowtie1/Ostrea_lurida-Scaff-10k.fa -o ~/Documents/test/", file.names.list[i], ".sam"))
}
[bsmap] @Thu Dec 22 15:03:55 2016 loading reference file: /home/sean/Documents/Bismark-Mapping-Eff/Bowtie1/Ostrea_lurida-Scaff-10k.fa (format: FASTA)
[bsmap] @Thu Dec 22 15:03:56 2016 8733 reference seqs loaded, total size 131223850 bp. 1 secs passed
[bsmap] @Thu Dec 22 15:04:03 2016 create seed table. 8 secs passed
[bsmap] @Thu Dec 22 15:04:03 2016 Single-end alignment(8 threads),
Input read file: /home/sean/Documents/test/cleaned_zr1394_8.fastq (format: FASTQ)
Output file: /home/sean/Documents/test/cleaned_zr1394_8.fastq.sam (format: SAM)
[bsmap] @Thu Dec 22 15:24:59 2016 total reads: 57683509 total time: 1264 secs
aligned reads: 6365355 (11.0%), unique reads: 3771203 (6.5%), non-unique reads: 2594152 (4.5%)
[bsmap] @Thu Dec 22 15:24:59 2016 loading reference file: /home/sean/Documents/Bismark-Mapping-Eff/Bowtie1/Ostrea_lurida-Scaff-10k.fa (format: FASTA)
[bsmap] @Thu Dec 22 15:25:01 2016 8733 reference seqs loaded, total size 131223850 bp. 2 secs passed
[bsmap] @Thu Dec 22 15:25:06 2016 create seed table. 7 secs passed
[bsmap] @Thu Dec 22 15:25:06 2016 Single-end alignment(8 threads),
Input read file: /home/sean/Documents/test/zr1394_8.fastq (format: FASTQ)
Output file: /home/sean/Documents/test/zr1394_8.fastq.sam (format: SAM)
[bsmap] @Thu Dec 22 15:47:31 2016 total reads: 62466580 total time: 1352 secs
aligned reads: 6066622 (9.7%), unique reads: 3655578 (5.9%), non-unique reads: 2411044 (3.9%)
[bsmap] @Thu Dec 22 15:47:31 2016 loading reference file: /home/sean/Documents/Bismark-Mapping-Eff/Bowtie1/Ostrea_lurida-Scaff-10k.fa (format: FASTA)
[bsmap] @Thu Dec 22 15:47:32 2016 8733 reference seqs loaded, total size 131223850 bp. 1 secs passed
[bsmap] @Thu Dec 22 15:47:37 2016 create seed table. 6 secs passed
[bsmap] @Thu Dec 22 15:47:37 2016 Single-end alignment(8 threads),
Input read file: /home/sean/Documents/test/zr1394_8_trimmed.fastq (format: FASTQ)
Output file: /home/sean/Documents/test/zr1394_8_trimmed.fastq.sam (format: SAM)
[bsmap] @Thu Dec 22 16:09:51 2016 total reads: 62325895 total time: 1340 secs
aligned reads: 8104016 (13.0%), unique reads: 4636019 (7.4%), non-unique reads: 3467997 (5.6%)